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Abstract 


Compilers  for  monomorphic  languages,  such  as  C  and  Pascal,  take  advantage  of  types 
to  determine  data  representations,  alignment,  calling  conventions,  and  register  selec¬ 
tion.  However,  these  languages  lack  important  features  including  polymorphism,  abstract 
datatypes,  and  garbage  collection.  In  contrast,  modern  programming  languages  such  as 
Standard  ML  (SML),  provide  all  of  these  features,  but  existing  implementations  fail  to 
take  full  advantage  of  types.  The  result  is  that  performance  of  SML  code  is  quite  bad 
when  compared  to  C. 

In  this  thesis,  I  provide  a  general  framework,  called  type-directed  compilation, 
that  allows  compiler  writers  to  take  advantage  of  types  at  all  stages  in  compilation.  In 
the  framework,  types  are  used  not  only  to  determine  efficient  representations  and  calling 
conventions,  but  also  to  prove  the  correctness  of  the  compiler.  A  key  property  of  type- 
directed  compilation  is  that  all  but  the  lowest  levels  of  the  compiler  use  typed  intermediate 
languages.  An  advantage  of  this  approach  is  that  it  provides  a  means  for  automatically 
checking  the  integrity  of  the  resulting  code. 

An  important  contribution  of  this  work  is  the  development  of  a  new,  statically- 
typed  intermediate  language,  called  A fIL .  This  language  supports  dynamic  type  dispatch, 
providing  a  means  to  select  operations  based  on  types  at  run  time.  I  show  how  to 
use  dynamic  type  dispatch  to  support  polymorphism,  ad-hoc  operators,  and  garbage 
collection  without  having  to  box  or  tag  values.  This  allows  compilers  for  SML  to  take 
advantage  of  techniques  used  in  C  compilers,  without  sacrificing  language  features  or 
separate  compilation. 

To  demonstrate  the  applicability  of  my  approach,  I,  along  with  others,  have 
constructed  a  new  compiler  for  SML  called  TIL  that  eliminates  most  restrictions  on  the 
representations  of  values.  The  code  produced  by  TIL  is  roughly  twice  as  fast  as  code 
produced  by  the  SML/NJ  compiler.  This  is  due  at  least  partially  to  the  use  of  natural 
representations,  but  primarily  to  the  conventional  optimizer  which  manipulates  typed, 
A,u/'  code.  TIL  demonstrates  that  combining  type-directed  compilation  with  dynamic 
type  dispatch  yields  a  superior  architecture  for  compilers  of  modern  languages. 
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Chapter  1 
Introduction 


The  goal  of  my  thesis  is  to  show  that  types  can  and  should  be  used  throughout  imple¬ 
mentations  of  modern  programming  languages.  More  specifically,  I  claim  that,  through 
the  use  of  type-directed  translation  and  dynamic  type  dispatch  (explained  below),  we  can 
compile  polymorphic,  garbage-collected  languages,  such  as  Standard  ML  [90],  without 
sacrificing  natural  data  representations,  efficient  calling  conventions,  or  separate  compi¬ 
lation.  Furthermore,  I  claim  that  a  principled  language  implementation  based  on  types 
lends  itself  to  proofs  of  correctness,  as  well  as  tools  that  automatically  verify  the  in¬ 
tegrity  of  the  implementation.  In  short,  compiling  with  types  yields  both  safety  and 
performance. 

Traditionally,  compilers  for  low-level,  monomorphic  languages,  such  as  C  and  Pascal, 
have  taken  advantage  of  the  invariants  guaranteed  by  types  to  determine  data  represen¬ 
tations,  alignment,  calling  conventions,  register  selection  and  so  on.  For  example,  when 
allocating  space  for  a  record,  a  C  compiler  can  determine  the  size  of  the  record  from  its 
type.  When  allocating  a  register  for  a  variable  of  type  double,  a  C  compiler  will  use  a 
floating  point  register  instead  of  a  general  purpose  register.  Some  implementations  take 
advantage  of  types  to  support  tag-free  garbage  collection  [23,  119,  6]  and  so-called  “con¬ 
servative”  garbage  collection  [21],  Types  are  also  used  to  support  debugging,  printing 
and  parsing,  marshaling,  and  other  means  of  traversing  a  data  structure. 

In  addition  to  directing  implementation,  types  are  useful  for  proving  formal  properties 
of  programs.  For  instance,  it  is  possible  to  prove  that  every  term  in  the  simply-typed 
A-calculus  terminates.  Similarly,  it  is  possible  to  show  that  there  is  no  closed  value  in 
the  Girard-Reynolds  polymorphic  A-calculus  with  the  type  Vet. a.  Compilers  can  take 
advantage  of  these  properties  to  produce  better  code.  For  instance,  a  compiler  can 
determine  that  a  function,  which  takes  an  argument  of  type  \/a.a,  will  never  be  called 
simply  because  there  are  no  values  of  the  argument  type.  Therefore,  the  compiler  can 
safely  eliminate  the  function. 

Types  are  also  useful  for  proving  relations  between  programs.  In  particular,  types 
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are  useful  for  showing  that  two  programs  are  equivalent,  in  the  sense  that  they  compute 
equivalent  values.  This  provides  a  powerful  mechanism  for  proving  that  a  compiler  is 
correct:  Establish  a  set  of  simulation  relations  based  on  types,  and  then  show  that  every 
source  program  and  its  translation  are  in  the  relation  given  by  the  source  program’s  type. 

Unfortunately,  two  obstacles  have  prevented  language  implementors  from  taking  full 
advantage  of  types.  The  first  obstacle  is  that  implementors  have  lacked  a  sufficiently 
powerful,  yet  convenient  framework  for  formally  expressing  their  uses  of  types.  Instead, 
implementors  rely  upon  informal,  ad  hoc  specifications  of  typing  properties.  This  in  turn 
prevents  the  implementor  from  proving  formal  properties  based  on  types,  and  from  recog¬ 
nizing  opportunities  for  better  implementation  techniques  within  a  compiler  or  runtime 
system. 

A  good  example  of  this  issue  comes  from  the  literature  on  type-based,  tag-free  garbage 
collection,  where  we  find  many  descriptions  of  clever  schemes  for  maintaining  type  in¬ 
formation  at  run  time  to  support  memory  management.  Many  of  the  approaches  are 
surprisingly  difficult  to  implement  and  rely  upon  very  subtle  typing  properties.  Yet,  few 
if  any  of  these  descriptions  are  formal  in  any  sort  of  mathematical  sense.  Indeed,  the 
basic  definitions  of  program  evaluation  and  what  it  means  for  a  value  to  be  garbage 
are  at  best  described  informally,  and  at  worst  left  unstated.  Consequently,  we  have  no 
guarantee  that  the  algorithms  are  in  any  way  correct.  Practically  speaking,  this  keeps 
us  from  modifying  or  adapting  the  algorithms  with  any  assurance,  simply  because  the 
necessary  invariants  are  left  implicit. 

The  second  obstacle  keeping  implementors  from  fully  taking  advantage  of  types  is 
that  types  have  become  complex,  relative  to  the  simple  monomorphic  type  systems  of 
C  and  Pascal.  To  support  better  static  type  checking,  abstraction,  code  reuse,  separate 
compilation,  and  other  software  engineering  practices,  we  have  evolved  from  using  simple 
monomorphic  types,  to  using  the  complex,  modern  types  of  Standard  AIL  (SML),  Haskell, 
and  Quest.  These  modern  types  include  polymorphic  types,  abstract  types,  object  types, 
module  types,  qualified  types,  and  even  dependent  types.  These  kinds  of  types  have  one 
thing  in  common:  They  include  component  types  that  are  unknown  at  compile  time.  In 
fact,  many  of  these  types  contain  components  that  are  variable  at  run  time. 

Most  of  the  type-based  implementation  techniques  used  in  compilers  for  C  and  Pascal 
rely  critically  upon  knowing  the  type  of  every  object  at  compile  time.  Implementors  have 
lacked  a  sufficiently  general  approach  for  extending  these  techniques  to  cover  unknown 
or  variable  types.  Because  of  this,  compilers  for  languages  like  SML,  which  have  variable 
types,  have  traditionally  ignored  type  information,  and  treated  the  language  as  if  it 
was  uni-typed.  Consequently  and  ironically,  implementations  of  modern  languages  suffer 
performance  problems  because  they  provide  advanced  types,  but  fail  to  take  advantage 
of  simple  types. 

The  purpose  of  this  thesis  is  to  remove  these  two  obstacles  and  open  the  path  for 
language  implementors  to  take  full  advantage  of  types.  To  address  the  first  obstacle, 
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lack  of  formalism,  I  demonstrate  how  to  formalize  key  compilation  phases  and  a  run-time 
system  rising  a  methodology  called  type- directed  translation.  This  methodology  provides 
a  unifying  framework  for  specifying  and  proving  the  correctness  of  a  compiler  that  takes 
full  advantage  of  types.  However,  to  compile  languages  like  SML,  the  methodology  of 
type-directed  translation  requires  some  formal  mechanism  for  dealing  with  the  second 
obstacle  —  variable  types. 

To  address  variable  types,  I  provide  a  new  compilation  technique,  dynamic  type  dis¬ 
patch,  that  extends  traditional  approaches  for  compiling  monomorphic  languages  to  han¬ 
dle  modern  types.  In  principle,  dynamic  type  dispatch  has  none  of  the  drawbacks  of 
previous  approaches,  but  it  introduces  some  issues  of  its  own.  In  particular,  to  take 
full  advantage  of  dynamic  type  dispatch,  we  must  propagate  type  information  through 
each  phase  of  a  compiler.  Fortunately,  type-directed  translation  provides  a  road  map  for 
achieving  this  goal. 

In  short,  the  two  contributions  of  this  thesis,  type-directed  translation  and  dynamic 
type  dispatch,  are  equally  important  because  they  rely  critically  upon  each  other.  To 
demonstrate  the  practicality  of  these  two  techniques,  I  (with  others)  have  constructed  a 
compiler  for  SML  called  TIL.  TIL,  which  stands  for  Typed  Intermediate  Languages,  takes 
advantage  of  both  type-directed  translation  and  dynamic  type  dispatch  to  provide  natural 
representations  and  calling  conventions.  The  type-directed  transformations  performed  by 
TIL  reduce  running  times  by  roughly  40%  and  heap  allocation  by  50%. 

In  addition  to  type-directed  translation  and  dynamic  type  dispatch,  TIL  employs  a 
set  of  conventional  functional  language  optimizations.  These  optimizations  account  for 
much  of  the  good  performance  of  TIL,  in  spite  of  the  fact  that  they  operate  on  statically- 
typed  intermediate  languages.  Indeed,  TIL  produces  code  that  is  roughly  twice  as  fast  as 
code  produced  by  the  SML/NJ  compiler  [12],  which  is  one  of  the  best  existing  compilers 
for  Standard  ML. 

The  rest  of  this  chapter  serves  as  an  overview  of  the  thesis.  In  Section  1.1,  I  give  an 
overview  of  type-directed  translation.  In  Section  1.2,  I  discuss  the  problem  of  compiling 
in  the  presence  of  variable  types,  discuss  previous  approaches  to  this  problem,  and  show 
why  these  solutions  are  inadequate.  In  Section  1.3,  I  give  an  overview  of  dynamic  type 
dispatch  and  discuss  why  it  is  a  superior  to  previous  approaches  of  compiling  in  the 
presence  of  variable  types.  In  Section  1.4,  I  discuss  the  key  issue  of  using  dynamic  type 
dispatch  in  conjunction  with  type-directed  translation  —  how  to  type  check  a  language 
that  provides  dynamic  type  dispatch.  Finally,  in  Section  1.5,  I  give  a  comprehensive 
overview  of  the  rest  of  the  thesis. 
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1.1  Type  Directed  Translation 

A  compiler  transforms  a  program  in  a  source  language  into  a  program  in  a  target  language. 
Usually,  we  think  of  the  target  language  as  more  “primitive”  than  the  source  language 
according  to  functionality.  As  an  example,  consider  the  compilation  of  SML  programs 
to  a  lower-level  language  such  as  C.  We  consider  C  “lower-level”  because  SML  provides 
features  that  C  does  not  directly  provide,  such  as  closures,  exceptions,  automatic  memory 
management,  algebraic  data  types,  and  so  forth.  These  high-level  constructs  must  be 
encoded  into  constructs  that  C  does  support.  For  instance,  closures  can  be  encoded  in  C 
as  a  struct  containing  both  a  pointer  to  a  C  function  and  a  pointer  to  another  struct 
that  contains  values  of  the  free  variables  in  the  closure. 

It  is  a  daunting  task  to  compile  a  high-level  language,  such  as  SML,  to  a  relatively  low- 
level  language,  such  as  C,  and  even  more  daunting  to  compile  to  an  extremely  low-level 
language,  such  as  machine  code.  The  only  feasible  approach  to  overcome  the  complexity 
is  to  break  the  task  into  a  series  of  simpler  compilers  that  successively  map  their  source 
language  to  closely  related,  but  slightly  simpler  target  languages.  Taking  the  sequential 
composition  of  these  simpler  compilers  yields  a  compiler  from  the  original  source  language 
to  the  final  target  language.  Decomposing  a  compiler  into  a  series  of  simpler  compilers 
has  an  added  benefit:  Correctness  of  the  entire  compiler  can  be  established  by  proving 
the  correctness  of  each  of  the  simpler  compilers. 

The  initial  task  a  compiler  writer  faces  is  deciding  how  to  break  her  compiler  into 
smaller,  more  manageable  compilation  steps.  She  must  decide  what  language  feature(s) 
to  eliminate  in  each  step  of  compilation  and  she  must  develop  a  strategy  for  how  this  is 
to  be  accomplished.  Next,  for  each  stage  of  compilation  she  must  design  and  specify  the 
intermediate  target  languages.  This  includes  formally  specifying  a  dynamic  semantics  so 
that  we  know  precisely  what  each  target  language  construct  means.  Then,  the  compiler 
writer  must  formulate  a  precise,  but  sufficiently  high-level  description  of  an  algorithm 
that  maps  the  source  language  to  the  target  language  for  each  stage.  Finally,  the  compiler 
writer  must  prove  each  of  the  translation  algorithms  correct.  A  translation  is  correct  if 
when  we  run  the  source  program  (using  the  source  language  dynamic  semantics)  and  we 
run  the  translation  of  the  program  (using  the  target  language  dynamic  semantics),  then 
we  get  “equivalent”  answers.  For  simple  answers,  such  as  strings  or  integers,  “equivalent” 
usually  means  syntactic  equality.  However,  weaker,  semantic  notions  of  equivalence  are 
needed  to  relate  more  complex  objects  such  as  functions. 

In  this  thesis,  I  demonstrate  this  methodology  by  deriving  key  parts  of  a  compiler 
from  a  simple  ML-like  functional  language  to  a  relatively  low-level  language  that  makes 
representations,  calling  conventions,  closures,  allocation,  and  garbage  collection  explicit. 
I  formulate  the  translation  as  a  series  of  type-directed  and  type-preserving  maps.  By  type- 
directed,  I  mean  that  the  source  language  of  each  stage  is  statically  typed  and  source 
types  are  used  to  guide  the  compilation  at  every  step.  For  instance,  given  a  generic 
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structural  equality  operation  at  the  source  level,  we  can  select  the  code  that  implements 
the  operation  according  to  the  type  assigned  to  the  arguments  of  the  operation.  If  that 
type  is  int,  we  use  integer  equality,  and  if  the  type  is  float,  we  use  floating  point  equality, 
and  so  on. 

By  a  type-preserving  translation,  I  mean  the  following:  first,  both  the  source  and 
the  target  languages  are  typed.  Second,  a  type-preserving  translation  specifies  a  type 
translation  in  addition  to  a  term  translation.  Third,  if  a  source  expression  e  has  type  r, 
the  term  translation  of  e  is  e' ,  and  the  type  translation  of  r  is  r',  then  e'  has  type  r1. 
Therefore,  assuming  the  input  is  well-typed,  so  is  the  output. 

By  using  a  typed  target  language  in  addition  to  a  typed  source  language,  we  ensure 
that  any  later  translations  in  the  compiler  can  continue  to  take  advantage  of  types  for  their 
own  purpose.  For  example,  in  Chapter  7,  I  take  advantage  of  these  types  to  implement 
garbage  collection. 

In  type-directed  translation,  we  not  only  use  types  to  guide  the  translation,  but  we 
also  use  types  to  argue  that  the  translation  is  correct.  Indeed,  it  is  the  presence  of 
types  that  allows  us  to  define  what  it  means  for  the  translation  to  be  correct!  Thus,  the 
contribution  of  types  to  the  compilation  process  is  many  fold:  We  use  types  to  select 
appropriate  representations,  calling  conventions,  and  primitive  operations,  and  we  use 
types  to  prove  that  the  compiler  is  correct. 


1.2  The  Issue  of  Variable  Types 

Modern  programming  languages,  such  as  C++,  CLU,  Modula-3,  Ada,  Standard  ML,  Eif¬ 
fel,  and  Haskell  all  provide  type  systems  that  are  much  more  expressive  than  the  simple, 
monomorphic  systems  of  C  and  Pascal.  In  particular,  each  of  these  languages  supports 
at  least  one  notion  of  unknown  or  variable  type.  Variable  types  arise  in  conjunction 
with  two  key  language  advances:  abstract  data  types  and  polymorphism.  These  features 
are  the  building  blocks  of  relatively  new  language  features  including  modules,  generics, 
objects,  and  classes. 

The  SML  code  in  Figure  1.1  provides  an  example  use  of  an  unknown  type.  The  code 
implements  a  merge  sort  on  a  list  of  values  of  uniform,  but  unknown  type,  denoted  by 
a.  The  function  sort  takes  a  predicate  It  (less-than)  of  type  a  *  a  ->  bool,  a  list  of 
a  values,  and  produces  a  list  of  a  values.  We  can  apply  the  sort  function  to  a  list  of 
integers,  passing  integer  less-than  as  the  comparison  operator: 

sort  (op  <  :  int*int->bool)  [5,7)2,153,9,10] 

Alternatively,  we  can  apply  the  sort  function  to  a  list  of  floating  point  values,  passing 
floating  point  less-than  as  the  comparison  operator: 


sort  (op  <  :  real*real->bool)  [138.0,3.1415,4.79] 
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fun  sort  (It :  «*a;->bool)  (l:a  list)  :  a  list  = 
let  fun  merge  (nil,l)  =  1 
|  merge  (l,nil)  =  1 
|  merge  (x: :tx,y: :ty)  = 
if  lt(x,y)  then  x  ::  merge (tx ,y : :ty) 
else  y  ::  merge (x: :tx,ty) 

fun  split  ([],one,two)  =  (sort  It  one,sort  It  two) 
|  split  (x: :tl,one,two)  =  split  (tl,two,x: :one) 
in 

case  1  of 

merge  (split  (1,  [],[] )) 

I  _  =>  1 

end 


Figure  1.1:  A  Polymorphic  Merge  Sort  Function 


In  fact,  we  can  pass  a  list  of  any  type  to  the  sort  routine  provided  we  have  an  appropriate 
comparison  operation  for  that  type.  By  abstracting  the  component  type  of  the  list  as  a 
type  variable,  we  can  write  the  sort  code  once  and  use  it  at  as  many  types  as  we  like.  For 
monomorphic  languages  like  Pascal,  which  strictly  enforce  types,  a  sort  implementation 
must  be  copied  for  each  instantiation  of  a.  This  can  waste  code  space  and  make  program 
maintenance  more  difficult.  For  instance,  if  we  find  a  bug  in  the  sort  implementation, 
then  we  must  make  sure  to  eliminate  the  bug  in  all  copies. 

The  ability  to  use  code  at  different  types,  as  in  the  preceding  sort  example,  is  usually 
called  polymorphism ,  meaning  “many  changing” .  Polymorphism  is  closely  related  to  the 
notion  of  abstract  data  types  (ADTs).  ADTs  are  objects  that  implement  some  data 
type  and  its  corresponding  operations,  but  hold  the  representation  of  the  data  type  and 
the  implementation  of  the  operations  abstract.  For  example,  the  following  SML  code 
implements  a  stack  of  integers  as  an  ADT  via  the  abstype  mechanism: 

abstype  stack  =  Stack  of  int  list 
with 

val  empty  =  Stack  nil 

fun  push  (x,  Stack  s)  =  Stack  (x  : :  s) 

exception  Empty 

fun  pop  (Stack  nil)  =  raise  Empty 

I  pop  (Stack  (x  ::  s))  =  (x,  Stack  s) 


end 
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The  type  system  of  SML  prevents  the  client  of  an  abstype  from  examining  the  represen¬ 
tation  of  the  abstracted  type.  Hence,  if  we  try  to  use  a  stack  as  if  it  is  an  integer  list,  we 
will  get  a  type  error.  A  key  advantage  of  abstracting  the  type  is  that,  in  principal,  we 
can  separately  compile  the  definition  of  the  ADT  from  its  uses.  Furthermore,  we  should 
be  able  to  change  the  representation  of  the  abstracted  type  without  having  to  recompile 
clients  of  the  abstraction.  For  example,  we  could  use  a  vector  to  represent  the  stack  of 
integers  instead  of  a  list. 

But  how  do  we  compile  in  the  presence  of  variable  types?  Consider,  for  example,  the 
following  issues: 

1.  Functions  can  have  arguments  of  unknown  type: 

fun  id  ( x:a )  :  a  =  x 

Since  a  can  be  instantiated  to  any  type,  what  register  should  we  use  to  pass  the 
argument  to  id?  Should  we  use  a  floating  point  or  general  purpose  register? 

2.  The  following  function  creates  a  record  of  values  whose  types  are  unknown: 

fun  foo  ( x:a,y:j3 )  =  (x,y,x,y) 

How  much  space  should  we  allocate  for  the  data  structure?  How  do  we  align  the 
components  of  the  record  to  support  efficient  access?  Should  we  add  padding? 
Should  we  pack  the  fields  of  the  record? 

3.  In  languages  such  as  SML,  n-argument  functions  are  represented  by  functions  taking 
a  single  n- tuple  as  an  argument.  This  makes  the  language  uniform  and  simplifies  the 
semantics.  But  for  efficiency,  we  want  to  “flatten”  a  tuple  argument  into  multiple 
arguments.  If  the  argument  to  a  function  has  a  variable  type,  then  how  do  we  know 
if  we  should  flatten  the  argument? 

4.  When  compiling  an  ad-hoc  polymorphic  operation  such  as  structural  equality  (e.g., 
eq(ei,  e2))  and  the  type  of  the  arguments  is  a  variable,  what  code  should  we  gener¬ 
ate?  Should  we  generate  an  integer  comparison,  floating  point  comparison,  or  code 
that  extracts  components  and  recursively  compares  them  for  equality? 

5.  How  do  we  determine  the  size  and  pointers  of  a  value  of  unknown  type  so  that  we 
can  perform  tracing  garbage  collection? 

In  this  section,  I  explore  existing  solutions  to  these  issues  and  discuss  their  relative 
strengths  and  weaknesses.  As  I  will  show,  none  of  the  existing  approaches  preserves  all 
of  the  following  desirable  implementation  properties: 


separate  compilation 
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•  natural  representations  and  calling  conventions 

•  fully  polymorphic  definitions  or  fully  abstract  data  types 

1.2.1  Previous  Approach:  Eliminate  Variable  Types 

The  easiest  way  to  implement  a  language  with  variable  types  is  to  contrive  either  the 
language  or  the  implementation  so  that  all  of  the  variable  types  are  eliminated  before 
compilation  begins.  This  allows  us  to  use  a  standard,  monomorphic  compiler. 

The  “elimination”  approach  has  been  used  in  various  guises  by  implementations  of 
C++  [114],  Ada  [121],  NESL  [20],  and  Gofer  [74]  to  support  ADTs  and  polymorphism. 

•  When  defining  a  new  class  in  C++,  the  definition  is  placed  in  a  “.h”  file.  The 
definition  is  #included  by  any  client  code  that  wishes  to  use  the  abstraction. 
Hence,  the  compiler  can  always  determine  the  representation  of  an  abstract  data 
type.  The  type  system  enforces  the  abstracted  type  within  the  client,  but  the  first 
stage  of  compilation  eliminates  the  abstracted  type  variable,  and  replaces  it  with 
its  implementation  definition. 

•  When  defining  an  ADT  via  the  package  mechanism  of  Ada,  it  is  sometimes  neces¬ 
sary  to  expose  the  representation  of  the  ADT  in  the  interface  of  the  package.  This 
representation  is  “hidden”  in  a  private  part  of  the  interface.  Again,  the  type  system 
of  the  language  enforces  the  abstraction,  but  the  first  stage  of  compilation  replaces 
the  abstract  type  variable  with  the  implementation  representation. 

•  NESL  is  a  programming  language  for  parallel  computations  that  allows  program¬ 
mers  to  define  polymorphic  functions  [20].  However,  the  NESL  implementation 
delays  compiling  any  polymorphic  definitions.  Instead,  whenever  a  polymorphic 
function  is  instantiated  with  a  particular  type,  the  type  is  substituted  for  the  occur¬ 
rences  of  the  type  variable  within  the  polymorphic  code.  The  resulting  monomor¬ 
phic  code  is  compiled.  A  caching  scheme  is  used  to  minimize  code  duplication. 

•  Gofer,  a  dialect  of  Haskell,  provides  both  polymorphism  and  type  classes.  Mark 
Jones  constructed  an  implementation  that,  like  the  NESL  implementation,  performs 
all  polymorphic  instantiation  at  compile  time  [73] . 

Unfortunately,  the  “eliminate  variable  types”  approach  has  many  drawbacks.  One 
drawback  is  that  polymorphic  code  is  never  shared.  Instead,  each  polymorphic  definition 
is  copied  at  least  once  for  each  unique  instantiation.  This  can  have  a  serious  effect  on  both 
compile  times  and  instruction  cache  locality.  For  C++,  the  definitions  in  the  “.h”  file 
must  be  processed  each  time  a  client  is  compiled.  Newer  compilers  attempt  to  cache  the 
results  of  this  processing  in  a  separate  file  precisely  to  avoid  this  compilation  overhead. 
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The  caching  scheme  used  by  NESL  ensures  that  exactly  one  copy  is  made  for  a  given 
type,  but  no  code  is  actually  shared  across  different  instantiations.  Both  caching  schemes 
introduce  a  coherence  problem:  When  the  polymorphic  definition  is  updated,  the  cached 
definitions  must  be  discarded.  For  languages  like  Gofer,  Haskell,  and  SML,  which  provide 
nested  polymorphic  definitions,  it  is  possible  that  the  number  of  copies  of  a  polymorphic 
definition  could  grow  exponentially  with  the  number  of  type  variables  in  the  definition. 
(However,  Jones  reports  that  this  does  not  occur  in  practice  [73].) 

Even  if  code  size,  compile  times,  and  instruction  cache  locality  were  not  an  issue,  the 
“eliminate”  approach  sacrifices  separate  compilation  of  an  ADT  or  polymorphic  definition 
from  its  uses.  For  example,  if  we  change  the  implementation  of  an  ADT  implemented 
using  an  Ada  package,  then  we  must  also  change  the  private  portion  of  the  package 
specification.  Since  all  clients  depend  upon  this  specification,  changing  the  ADT  imple¬ 
mentation  requires  that  all  clients  be  recompiled.  Similarly,  a  simple  change  to  a  class 
definition  in  C++  can  require  the  entire  program  to  be  recompiled. 

Increasingly,  we  are  moving  away  from  a  world  where  we  have  access  to  all  of  the  source 
files  of  a  program,  and  where  we  can  “batch”  process  the  compilation,  linking,  and  loading 
of  a  program.  For  example,  vendor-supplied,  dynamically-linked  libraries  (e.g.,  Xlib,  Tk) 
are  now  the  norm  instead  of  the  exception.  Often,  it  is  impossible  to  get  the  source 
code  for  such  libraries  and  it  is  prohibitively  time-consuming  to  recompile  them  for  each 
application,  especially  during  development.  We  now  have  languages  such  as  Java  [51]  and 
Obliq  [27,  28]  where  objects  and  code  are  dynamically  transmitted  from  one  machine  to 
another  via  a  network,  compiled  to  a  native  representation,  and  then  dynamically  linked 
into  a  running  program.  Hence,  the  ability  to  compile  program  components  separately  is 
becoming  increasingly  important  and  any  compilation  methodology  must  provide  some 
level  of  support  for  separate  compilation. 

Finally,  for  many  newer  programming  languages,  it  is  simply  impossible  to  eliminate 
all  polymorphism  or  all  ADTs  at  compile  time.  Consider,  for  example,  a  language  that 
supports  first-class  polymorphic  definitions.  Such  objects  can  be  placed  in  data  struc¬ 
tures,  passed  as  arguments  to  functions,  and  so  forth.  Thus,  determining  all  of  the  types 
that  instantiate  a  given  definition  becomes  in  general  undecidable. 

1.2.2  Previous  Approach:  Restrict  Representations 

A  different  approach  to  compiling  in  the  presence  of  variable  types  is  to  restrict  the 
types  that  can  instantiate  a  type  variable.  This  approach,  known  as  boxing ,  restricts 
type  variables  so  that,  no  matter  what  the  actual  type  is,  the  representation  of  the  value 
has  the  same  size.  As  an  example,  Modula-3  allows  only  pointer  types  (e.g.,  ptr[r])  to 
be  used  as  the  implementation  of  an  abstract  type.  Assuming  all  pointers  are  the  same 
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size1,  we  can  always  allocate  a  data  structure  containing  values  of  type  ptr[t],  even  if  t 
is  unknown.  Similarly,  we  know  that  such  values  will  be  passed  in  general  purpose  as 
opposed  to  floating  point  registers. 

Languages  like  SML  do  not  make  a  restriction  on  polymorphism  or  variable  types  at 
the  source  level,  but  almost  all  implementations  use  such  a  restriction  in  compilation.  In 
essence,  they  perform  a  type-directed  translation  that  maps  variable  types  (t)  to  pointer 
types  (ptr[t] ) .  Unfortunately,  a  naive  translation  that  always  maps  type  variables  to 
pointer  types  is  not  type-preserving.  Consider  for  example  the  polymorphic  identity 
function  and  its  use  at  some  type: 

let  fun  id  (x:ct)  :  a  =  x 
in 

(id  :  float— >float)  3.1415 

end 

The  naive  translation  yields: 

let  fun  id  (x:ptr[a])  :  ptr[a]  =  x 
in 

(id  :  ptr [float] —>ptr [float] )  3.1415 

end 

However,  this  translation  is  ill-typed  because  id  takes  a  pointer  as  an  argument,  but  the 
literal  3.1415  is  a  floating  point  value.  Since  the  translation  is  ill-typed,  the  rest  of  the 
compiler  will  produce  erroneous  code.  For  example,  the  translation  of  the  application 
of  id  to  the  floating  point  value  may  place  the  argument  into  a  floating  point  register, 
whereas  the  code  of  the  function  will  be  translated  with  the  expectation  that  the  argument 
is  in  a  general  purpose  register. 

The  problem  is  that  in  general,  it  is  impossible  to  tell  whether  or  not  a  value  will  be 
passed  to  a  polymorphic  function.  If  a  value  is  passed  as  an  argument  of  unknown  type 
to  some  routine,  then  the  value  must  be  boxed  (i.e. ,  represented  as  a  pointer.)  Because 
it  is  impossible  to  tell  whether  or  not  a  value  will  be  passed  to  a  polymorphic  function, 
most  ML  compilers,  including  Polv/ML  [88],  Bigloo  [107],  Caml  [126],  and  older  versions 
of  SML/NJ  [9],  box  all  objects. 

Boxing  supports  separate  compilation  and  dynamic  linking,  but  unfortunately,  it  con¬ 
sumes  space  and  time  because  of  the  extra  indirection  that  is  introduced.  For  example, 
to  support  polymorphic  array  operations,  an  array  of  floating  point  values  must  be  repre¬ 
sented  by  an  array  of  pointers  to  singleton  records  that  contain  the  actual  floating  point 
values  (see  Figures  1.2  and  1.3).  An  extra  word  is  used  as  the  pointer  for  each  array 

1Even  this  is  a  dangerous  assumption  in  many  environments,  including  MS-DOS  versions  of  Borland 
C,  where  distinctions  are  made  between  “near”  and  “far”  pointers. 
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3. 141592*55  3  5  8  7  931 

2.71828 1^28459045 

42.00000 

000000000 

Figure  1.2:  Natural  Representation  of  a  Floating  Point  Array 


Figure  1.3:  Boxed  Representation  of  a  Floating  Point  Array 


element.  For  systems  such  as  SML/NJ  that  use  tagging  garbage  collection,  an  additional 
tag  word  is  required  for  each  element.  Hence,  the  polymorphic  array  can  consume  twice 
as  much  space  as  its  monomorphic  counterpart.  Furthermore,  accessing  an  array  element 
requires  an  additional  memory  operation.  As  memory  speeds  become  slower  relative  to 
processor  speeds,  this  extra  memory  operation  per  access  becomes  more  costly.  Finally, 
in  the  presence  of  a  copying  garbage  collector,  the  elements  of  the  array  could  be  scat¬ 
tered  across  memory.  This  could  destroy  the  spatial  locality  of  the  array,  resulting  in 
increased  data  cache  misses  and  even  longer  access  times. 

In  addition  to  causing  performance  problems,  boxing  also  interferes  with  interoper¬ 
ability.  As  with  tags  to  support  garbage  collection,  adding  extra  indirection  can  impede 
communication  with  systems  that  do  not  use  boxing.  In  particular,  it  becomes  difficult 
to  communicate  with  libraries,  runtime  services,  and  operating  system  services,  because 
they  tend  to  be  written  in  low-level  languages  such  as  C  or  Fortran  that  do  not  provide 
variable  types.  Extra  code  must  be  written  to  “marshal”  a  data  structure  from  its  boxed 
representation  to  the  representation  used  by  the  library,  runtime,  or  operating  system. 

Finally,  although  boxing  addresses  many  of  the  issues  of  compiling  in  the  presence 
of  variable  types,  it  does  not  help  us  eliminate  overloaded  operators  (such  as  structural 
equality)  or  perform  garbage  collection.  Hence,  standard  implementations  of  SML  both 
box  and  tag  all  values  to  support  their  advanced  language  features.  As  a  direct  result, 
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the  quality  of  the  code  emitted  by  most  SML  compilers,  even  when  these  features  are 
not  used,  is  far  below  the  quality  of  compilers  for  languages  like  C  or  Fortran. 

1.2.3  Previous  Approach:  Coercions 

Because  boxing  and  tagging  are  so  expensive,  a  great  deal  of  research  has  gone  into 
minimizing  these  costs  [75,  81,  64,  65,  102,  110].  A  particularly  clever  approach  was 
suggested  by  Xavier  Leroy  for  call-by-value  languages  based  on  the  ML  type  system  [81]. 
The  fundamental  idea  is  to  compile  monomorphic  code  in  exactly  the  same  way  that  it 
is  compiled  in  the  absence  of  variable  types,  and  to  compile  polymorphic  code  assuming 
that  variables  of  unknown  type  are  boxed  and  tagged.  As  we  showed  in  Section  1.2.2,  this 
results  in  a  type  mismatch  when  a  polymorphic  object  is  instantiated.  Leroy’s  solution  is 
to  apply  a  coercion  to  the  polymorphic  object  to  mitigate  this  mismatch.  The  coercion 
is  based  on  the  type  of  the  object  and  the  type  at  which  it  is  being  instantiated.  A 
fascinating  property  of  Leroy’s  solution  is  that,  for  languages  based  on  the  ML  type 
system,  the  appropriate  coercion  to  apply  in  a  given  situation  can  always  be  determined 
at  compile  time. 

As  an  example,  the  naive  boxing  translation  of  the  identity  function  produced  the 
following  incorrect  code: 

let  fun  id  (x:ptr[a])  :  ptr  [a]  =  x 
in 

(id  :  ptr [float] —>ptr [float] )  3.1415 

end 

Leroy’s  translation  fixes  the  mismatch  by  applying  a  boxing  coercion  to  the  argument  of 
the  polymorphic  function  and  an  unboxing  coercion  to  the  result: 

let  fun  id  (x:ptr[o:])  :  ptr [cc]  =  x 
in 

unbox  [float] 

(id  :  ptr  [float] — >ptr  [float] )  (box  [float]  (3 . 1415) ) 

end 

Assuming  box  and  unbox  convert  a  value  to  and  from  a  pointer  representation  and  add 
any  necessary  tags,  the  resulting  code  is  operationally  correct.  In  general,  a  polymorphic 
object  of  type  \fa.r[a\  is  compiled  with  the  type  Va.rfptrfo:]].  When  instantiated  with 
some  type  r',  the  object  has  the  type  r[ptr[r']],  but  the  object  is  expected  to  have  the 
type  t[t'].  A  coercion  is  applied  to  the  instantiated  object  to  correct  the  mismatch. 
The  coercion  is  calculated  via  a  function  S  that  maps  the  type  scheme  r[a]  and  the 
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instantiating  type  r'  to  a  term  as  follows: 


S' [a  ,  r'\ 
S[int,  t] 
S[float,  t  ] 
S[(ri  x  •  •  •  x  rn),  t  ] 

S[ti  r2,  t'] 


Xx.  unbox [r7]  (x) 

Xx.  x 
Xx.  x 

Xx.  let  X\='K\  X,  •  •  •  ,  Xn=7Tn  x 

in  •  •  • ,  S[rn,  rr](^n)} 

A/.  Xx.  S[r2,r'](/  (G[r \,t']x)) 


The  definition  of  S  at  arrow  types  uses  the  dual  coercion  function  G: 


G[a ,  t'] 
G[int,  t  ] 
Gffloat,  t  ] 
G[(tx  x  •  •  •  x  rn).  t  ] 


G[t\  r2,  t' 


Xx.  boxfr']^) 

Xx.  x 
Xx.  x 

Xx.  let  Xi=7Ti  X,  •  •  •  ,  Xn=TTn  x 

in  (G[ti,t']{xi),  •  •  • , G[rn,T']{xn)) 
Xf.  Xx.  G[r2,  r'](/  (S[rl5  r'jx)) 


The  coercions  generated  by  S  and  G  deconstruct  a  value  into  its  components  until  we 
reach  a  base  type  or  a  type  variable.  The  coercion  at  a  base  type  is  the  identity  but  the 
coercion  at  a  variable  type  requires  either  boxing  or  unboxing  that  component.  Once  the 
components  have  been  coerced,  the  aggregate  value  is  reassembled.  Hence,  it  is  fairly 
easy  to  show  that: 

S[r[a],r']  :  r[ptr[r']]  ->■  t[t'} 

G[rja], V]  :  t[t']  ->■  r[ptr[r']] 

and  thus  S  and  G  appropriately  mitigate  the  type  mismatch  that  occurs  at  polymorphic 
instantiation. 

The  coercion  approach  offers  the  best  mix  of  features  from  the  set  of  solutions  pre¬ 
sented  thus  far.  In  particular,  as  with  full  boxing,  it  supports  separate  compilation  and 
code  sharing.  Unlike  full  boxing,  monomorphic  code  does  not  have  to  pay  the  penalties  of 
boxing  and  tagging.  Leroy  found  that  his  coercion  approach  cut  execution  time  by  up  to 
a  factor  of  two  for  some  benchmarks  run  through  his  Gallium  compiler,  notably  numeric- 
codes  that  manipulate  many  integer  or  floating  point  values.  However,  for  at  least  one 
contrived  program  with  a  great  deal  of  polymorphism,  the  coercion  approach  slowed  the 
program  by  more  than  a  factor  of  two  [81].  Nevertheless,  his  coercion  approach  has  an 
attractive  property:  You  pay  only  for  the  polymorphism  you  use. 

Other  researchers  have  also  found  that  eliminating  boxing  and  tagging  through  coer¬ 
cions  can  cut  execution  times  and  allocation  considerably.  For  instance,  Shao  and  Appel 
were  able  to  improve  execution  time  by  about  19%  and  decrease  heap  allocation  by  36% 
via  Leroy-style  coercions  for  their  SML/NJ  compiler  [110].  However,  much  of  their  im¬ 
provement  (11%  execution  time,  30%  of  allocation)  comes  by  performing  a  type-directed 
flattening  of  function  arguments  as  part  of  the  coercion  process. 
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Unfortunately,  the  coercion  approach  has  some  practical  drawbacks:  First,  the  coer¬ 
cions  operate  by  deconstructing  a  value,  boxing  or  unboxing  some  components,  and  then 
building  a  copy  of  the  value  out  of  the  coerced  components.  Building  a  copy  of  a  large 
data  structure,  such  as  a  list,  array,  or  vector,  requires  mapping  a  coercion  across  the 
whole  data  structure.  Such  coercions  can  be  prohibitively  expensive  and  applying  them 
may  well  outweigh  the  benefits  of  leaving  the  data  structure  unboxed.  Second,  in  the 
presence  of  recursive  types  (ML  data  types),  refs,  or  arrays,  not  only  must  the  compo¬ 
nents  corresponding  to  type  variables  be  boxed,  but  their  components  must  be  recursively 
boxed  [81].  Third,  and  perhaps  most  troublesome,  it  is  difficult  if  not  impossible  to  make 
a  copy  of  a  mutable  data  structure  such  as  an  array.  The  problem  is  that  updates  to 
the  copy  must  be  reflected  in  the  original  data  structure  and  vice  versa.  Hence,  it  is 
impossible  to  apply  a  coercion  to  refs  or  arrays  and  consequently,  the  components  of 
such  data  structures  must  always  be  boxed. 

It  is  possible  to  represent  refs  and  arrays  as  a  pair  of  “get”  and  “set”  functions 
whose  shared  closure  contains  the  actual  ref  cell  or  array.  Then  the  standard  functional 
coercions  can  be  applied  to  the  get  and  set  operations  to  yield  a  coerced  mutable  data 
structure.  However,  having  to  perform  a  function  call  to  access  a  component  of  an  array 
can  easily  offset  any  benefits  from  leaving  the  array  unboxed. 

Finally,  the  coercion  approach  to  variable  types  is  simply  a  stop-gap  measure.  It  takes 
advantage  of  certain  properties  of  the  ML  type  system  -  notably  the  lack  of  first-class 
polymorphic  objects  -  to  ensure  that  the  appropriate  coercion  can  always  be  calculated 
at  compile  time.  This  approach  breaks  down  when  we  move  to  a  language  with  first-class 
polymorphism. 


1.3  Dynamic  Type  Dispatch 

There  is  an  approach  for  compiling  in  the  presence  of  variable  types,  first  suggested  by 
the  Napier  ’88  implementation  [97],  which  avoids  the  drawbacks  of  boxing  or  coercions 
without  sacrificing  separate  compilation.  The  idea  is  to  delay  deciding  what  code  to 
select  until  types  are  known.  This  is  accomplished  by  passing  types  that  are  unknown 
at  compile-time  to  primitive  operations.  Then,  the  operations  can  analyze  the  type  in 
order  to  select  and  dispatch  to  the  appropriate  code  needed  to  manipulate  the  natural 
representation  of  an  object.  I  call  such  an  approach  dynamic  type  dispatch. 

For  example,  a  polymorphic  subscript  function  on  arrays  might  be  compiled  into  the 
following  pseudo-code: 

sub  =  A  a.  typecase  a  of 
int  =>  intsub 
I  float  =>  floatsub 
I  ptr[cr]  =>  ptrsub[cr] 
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assuming  the  following  operations,  where  we  elide  the  “ptr[— ]”  around  arrow  and  array 
types  for  clarity: 

intsub  :  [intarray,  int]  — >  int 
floatsub  :  [floatarray,  int]  — ^  float 
ptrsub[ptr[cr]]  :  [ptrarray[ptr[cr]],  int]  — >  ptr[cr] 

Here,  sub  is  a  function  that  takes  a  type  argument  (ct),  and  then  performs  a  case  analysis 
to  determine  the  appropriate  specialized  subscript  function  that  should  be  returned.  For 
example,  sub[int]  returns  the  integer  subscript  function  that  expects  an  array  of  integers, 
whereas  sub  [float]  returns  the  floating  point  subscript  function  that  expects  a  double- 
word  aligned  array  of  floating  point  values.  All  other  types  are  pointers,  so  we  assume 
the  array  has  boxed  components  and  thus  sub  returns  the  boxed  subscript  function  at 
the  appropriate  large  type. 

If  the  sub  operation  is  instantiated  with  a  type  that  is  known  at  compile-time  (or 
link-time),  then  the  overhead  of  the  case  analysis  can  be  eliminated  by  duplicating  and 
specializing  the  definition  of  sub  at  the  appropriate  type.  For  example,  the  source  ex¬ 
pression 

sub(x,4)  +  3.14, 

will  be  compiled  to  the  target  expression 

sub  [float]  (x,  4)  +  3.14, 

since  the  result  of  the  sub  operation  is  constrained  to  be  a  float.  If  the  definition  of  sub  is 
inlined  into  the  target  expression  and  some  simple  reductions  are  performed,  this  yields 
the  optimized  expression: 

floatsub(x,4)  +  3.14. 

Like  the  coercion  approach  to  compiling  with  variable  types,  dynamic  type  dispatch 
supports  separate  compilation  and  allows  us  to  pay  only  for  the  polymorphism  that  we 
use.  In  particular,  monomorphic  code  can  be  compiled  as  if  there  are  no  variable  types. 
Furthermore,  unlike  coercions,  dynamic  type  dispatch  supports  natural  representations 
for  large  data  structures  (such  as  lists,  arrays,  or  vectors)  and  for  mutable  data  structures 
(such  as  arrays).  Instead  of  coercing  the  values  of  the  data  structures,  we  coerce  the 
behavior  of  the  operations.  Hence,  we  do  not  have  to  worry  about  keeping  copies  of  a 
mutable  data  structure  coherent.  As  a  result,  dynamic  type  dispatch  provides  better 
interoperability  than  any  of  the  previously  tried  solutions,  without  sacrificing  separate 
compilation. 

As  I  will  show,  dynamic  type  dispatch  also  supports  tag-free  overloaded  operations. 
For  example,  we  can  code  an  ML-style  polymorphic  equality  routine  by  dispatching  on 
a  type: 
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typerec  eq [int]  =  A(x,y)  .=  int  (x,y) 

I  eq  [float]  =  A(x,y)  •  —  float  (x,y) 

I  eq[ptr[(Ti  x  r2)]]  = 

A(x,y).  eq[ril  (7Ti  x,7Ti  y)  andalso  eq[r2]  (n2  x,7r2  y) 

I  eq[ptr[ri  ->■  r2J]  =  A(x,y).  false 

The  same  approach  can  be  used  to  dynamically  flatten  arguments  into  registers,  dy¬ 
namically  flatten  structs  and  align  their  components,  and  so  on.  Finally,  by  passing 
unknown  types  to  the  garbage  collector  at  run-time,  dynamic  type  dispatch  supports 
tag-free  garbage  collection. 

In  short,  dynamic  type  dispatch  provides  a  smooth  transition  from  compilers  for 
monomorphic  languages  to  compilers  for  modern  languages  with  variable  types. 


1.4  Typing  Dynamic  Type  Dispatch 

If  we  are  to  use  types  to  support  register  allocation,  calling  conventions,  data  structure 
layout,  and  garbage  collection,  we  must  propagate  types  through  compilation  to  the  stages 
where  these  decisions  are  made.  Many  of  these  decisions  are  made  after  optimization  or 
code  transformations,  so  it  is  important  that  we  can  propagate  type  information  to  as 
low  a  level  as  possible. 

An  intermediate  language  that  supports  run-time  type  dispatch  allows  us  to  express 
source  primitives,  such  as  array  subscript  or  polymorphic  equality,  as  terms  in  the  lan¬ 
guage.  This  exposes  the  low-level  operations  of  the  source  primitive  to  optimization  and 
transformations  that  may  not  be  expressible  at  the  source  level. 

If  we  are  to  use  an  intermediate  language  that  supports  run-time  type  dispatch,  we 
must  be  able  to  assign  a  type  to  terms  that  use  typecase.  But  what  type  should  we 
give  a  term  such  as  sub,  shown  previously?  We  cannot  use  a  parametric  type  such  as 
Va.  [array  [a],  int]  — >■  a,  because  instantiating  sub  with  int  for  instance,  yields  the  intsub 
operation  of  type  [intarray,  int]  — »  int  which  is  not  an  instantiation  of  the  parametric  type. 

My  approach  to  this  problem  is  to  consider  a  type  system  that  provides  type  dispatch 
at  the  type  level  via  a  “Typecase”  construct.  For  example,  the  sub  definition  can  be 
assigned  a  type  of  the  form: 


Vcr.  [Spcl  Array  [a],  int]  — >  a 

where  the  specialized  array  constructor  SpcIArray  is  defined  using  Typecase  as  follows: 

SpcIArray  [a]  =  Typecase  a  of 

int  =>  intarray 
I  float  =>  floatarray 
I  ptr[  a]  =>  ptrarray  [ptr[cr]] 
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The  definition  of  the  constructor  parallels  the  definition  of  the  term:  If  the  parameter 
a  is  instantiated  to  int,  the  resulting  type  is  intarray;  if  the  parameter  is  instantiated  to 
float,  the  resulting  type  is  floatarray. 

In  this  thesis,  I  present  a  formal  calculus  called  XfL  that  provides  run-time  type 
passing  and  type  dispatch  operations.  The  calculus  is  intended  to  provide  the  formal 
underpinnings  of  a  target  language  for  compiling  in  the  presence  of  variable  types.  I  prove 
two  important  properties  regarding  A HL:  The  type  system  is  sound  and  type  checking  is 
decidable. 

In  its  full  generality,  XfL  allows  types  to  be  analyzed  not  just  by  case  analysis  (i.e. , 
type  case),  but  also  via  primitive  recursion.  This  allows  more  sophisticated  transforma¬ 
tions  to  be  coded  within  the  target  language,  yet  type  checking  for  the  target  language 
remains  decidable. 

An  example  of  a  more  sophisticated  translation  made  possible  by  primitive  recursion 
is  one  where  arrays  of  pointers  to  pairs  are  represented  as  a  pointer  to  a  pair  of  arrays. 
For  example,  an  array  of  ptr[(int  x  float)]  is  represented  as  a  pointer  to  a  pair  of  an  intarray 
and  a  floatarray.  This  representation  allows  the  integer  components  of  the  array  to  be 
packed  and  allows  the  floating  point  components  to  be  naturally  aligned.  It  also  saves 
n  —  1  words  of  indirection  for  an  array  of  size  n,  since  pairs  are  normally  boxed.  The 
subscript  operation  for  this  representation  is  defined  using  a  recursive  typecase  construct 
called  typerec  in  the  following  manner: 

typerec  sub  [int]  =  intsub 

I  sub  [float]  =  floatsub 

I  sub[ptr[(ri  x  r2)]]  =  A[(x ,y) ,  i]  .  (sub  [rx]  x,  sub  [r2]  y) 

I  sub[ptr[a]]  =  ptrsub  [ptr [cr]] 

If  sub  is  given  a  product  type,  ptr[(ri  x  r2)],  it  returns  a  function  that  takes  a  pair  of 
arrays  ((x,y))  and  an  index  (i),  and  returns  the  pair  of  values  from  both  arrays  at  that 
index,  recursively  calling  the  sub  operation  at  the  types  T\  and  t2. 

The  type  of  this  sub  operation  is: 

Va.  [RecArray  [cr],  int]  — >•  a 

where  the  recursive,  specialized  array  constructor  RecArray  is  defined  using  a  type-level 

“Typerec”: 

Typerec  RecArray  [int]  =  intarray 

|  RecArray  [float]  =  floatarray 

|  RecArray  [ptr [(wx  x  r2)]]  =  ptr [( RecArray [ly]  x  RecArray[r2])] 

|  RecArray[ptr[cr]]  =  ptrarray[ptr[cr]] 

Again,  the  definition  of  the  constructor  parallels  the  definition  of  the  sub  operation.  If 
the  parameter  is  instantiated  with  int,  then  the  resulting  type  is  ptrfintarray].  If  the 
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parameter  is  instantiated  with  ptr[(7i  x  r2)],  then  the  resulting  type  is  the  product  of 
RecArray[rx]  and  RecArray[r2]. 


1.5  Overview  of  the  Thesis 

In  this  thesis,  I  show  that  we  can  take  advantage  of  types  to  compile  languages  with  vari¬ 
able  types,  without  losing  control  over  data  representations  or  performance.  In  particular, 
I  show  that  dynamic  type  dispatch  can  be  used  to  support  efficient  calling  conventions 
and  native  representations  of  data  without  sacrificing  efficient  monomorphic  code,  sepa¬ 
rate  compilation,  or  tag-free  garbage  collection.  I  also  show  how  key  pieces  of  a  standard 
functional  language  implementation  must  be  extended  to  accommodate  dynamic  type  dis¬ 
patch.  For  instance,  I  show  that  representation  analysis,  closure  conversion,  and  garbage 
collection  can  all  be  extended  to  work  with  and  take  advantage  of  dynamic  type  dispatch. 

A  significant  contribution  of  my  thesis  is  that  I  formulate  these  aspects  of  language 
implementation  at  a  fairly  abstract  level.  This  allows  me  to  present  concise  proofs  of 
correctness.  Even  if  we  ignore  dynamic  type  dispatch,  exhibiting  compact  formulations 
and  correctness  arguments  for  representation  analysis,  closure  conversion,  and  garbage 
collection  is  a  significant  contribution. 

The  TIL  compiler  demonstrates  not  only  that  dynamic  type  dispatch  is  a  viable 
technique  for  compiling  in  the  presence  of  variable  types,  but  also  that  type-directed 
compilation  does  not  interfere  with  standard  optimization,  such  as  inlining  (/5-reduction), 
common  sub-expression  elimination,  and  loop  invariant  removal.  Also,  TIL  demonstrates 
that  these  standard  optimizations  are,  for  the  most  part,  sufficient  to  eliminate  the 
overheads  of  dynamic  type  dispatch  for  an  SML-like  language. 

I  now  briefly  outline  the  remainder  of  the  thesis:  In  Chapter  2,  I  present  a  simple,  core 
polymorphic  source  language  called  Mini-ML.  I  define  the  syntax,  dynamic  semantics, 
and  static  semantics  of  the  language.  I  also  state  the  key  properties  of  the  static  semantics 
for  the  language.  Readers  familiar  with  ML-style  polymorphism  may  want  to  skip  this 
chapter,  but  compiler  writers  unfamiliar  with  formal  semantic  specifications  may  find 
this  chapter  illuminating. 

In  Chapter  3,  I  present  a  core  intermediate  language  called  XfL.  This  language 
provides  the  formal  underpinnings  of  a  calculus  with  dynamic  type  dispatch  that  is 
used  in  the  subsequent  chapters.  I  define  the  syntax,  dynamic  semantics,  and  static 
semantics  of  the  language.  In  Chapter  4,  I  summarize  the  key  semantic  properties  of 
the  formal  calculus,  including  decidability  of  type  checking  and  soundness  of  the  static 
semantics.  Proofs  of  these  properties  follow  for  those  interested  in  the  underlying  type 
theory.  Compiler  writers  may  want  to  skip  these  details. 

In  Chapter  5, 1  define  a  variant  of  Xf!L,  called  A,u/  -l!ep.  that  makes  calling  conventions 
explicit.  I  show  how  to  map  Mini-ML  to  Af^-Rep.  In  the  compilation,  I  show  how  to 
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eliminate  polymorphic  equality  and  flatten  function  arguments,  in  order  to  demonstrate 
the  power  and  utility  of  dynamic  type  dispatch.  I  establish  a  suitable  set  of  logical  sim¬ 
ulation  relations  between  Mini-ML  and  Af^-Rep  and  use  them  to  prove  the  correctness 
of  the  translation.  I  then  demonstrate  how  other  language  features  can  be  implemented 
using  dynamic  type  dispatch,  including  flattened  data  structures  with  aligned  compo¬ 
nents,  unboxed  floating  point  arguments,  Haskell-style  type  classes,  and  communication 
primitives. 

In  Chapter  6, 1  show  how  to  map  \fL  to  a  language  with  closed  code,  explicit  environ¬ 
ments,  and  explicit  closures.  This  translation,  known  as  closure  conversion ,  is  important 
because  it  eliminates  functions  with  free  variables.  The  key  difficulty  is  that  I  must 
account  for  free  type  variables  as  well  as  free  term  variables  when  producing  the  code 
of  a  closure  and  thus,  environments  must  contain  bindings  for  both  value  variables  and 
type  variables.  Unlike  most  accounts  of  closure  conversion,  the  target  language  of  my 
translation  is  still  typed.  This  allows  me  to  propagate  type  information  through  clo¬ 
sure  conversion.  In  turn,  this  supports  other  type-directed  transformations  after  closure 
conversion,  as  well  as  run-time  type  dispatch  and  tag-free  garbage  collection. 

In  Chapter  7,  I  provide  an  operational  semantics  for  a  monomorphic  subset  of  closure 
converted  A fL  code.  The  semantics  makes  the  heap  and  the  stack  explicit.  I  formalize 
garbage  collection  as  any  rewriting  rule  that  drops  portions  of  the  heap  without  affecting 
evaluation.  I  specify  and  prove  correct  an  abstract  formulation  of  copying  garbage  col¬ 
lection  based  on  the  abstract  syntax  of  terms  (i.e. ,  tags).  I  then  show  how  types  can  be 
used  to  support  tag-free  garbage  collection  and  prove  that  this  approach  is  sound.  Then 
I  show  how  to  extend  the  tag-free  approach  to  a  type-passing,  polymorphic  language  like 

In  Chapter  8,  I  give  an  overview  of  TIL  and  the  practical  issues  involved  in  compiling 
a  real  programming  language  to  machine  code  in  the  presence  of  dynamic  type  dispatch. 
I  also  examine  some  aspects  of  the  performance  of  TIL  code:  I  compare  the  running  times 
and  space  of  TIL  code  against  the  code  produced  by  SML/NJ.  I  also  measure  the  impact 
that  various  type-directed  translations  have  on  both  the  running  time  and  amount  of 
data  allocated. 

Finally,  in  Chapter  9,  I  present  a  summary  of  the  thesis,  and  discuss  future  directions. 


Chapter  2 

A  Source  Language:  Mini-ML 


In  this  chapter,  I  specify  a  starting  source  language,  called  Mini-ML,  that  is  based  on  the 
core  language  of  Standard  ML  [90,  31].  Although  Mini-ML  is  a  fairly  limited  language, 
it  has  many  of  the  constructs  that  one  might  find  in  a  conventional  functional  program¬ 
ming  language,  including  integers  and  floating  point  values;  first-class,  lexically-scoped 
functions;  tuples  (records);  and  polymorphism.  Indeed,  I  have  designed  Mini-ML  so  that 
it  brings  out  the  key  issues  one  must  address  when  compiling  a  modern  language  like 
SML. 

The  syntax  of  Mini-ML  is  given  in  Figure  2.1.  There  are  four  basic  syntactic  classes: 
monotypes,  polvtypes,  values,  and  expressions.  The  monotypes  of  Mini-ML  describe 
expressions  and  consist  of  type  variables  (t),  base  types  including  int,  float  and  unit, 
and  constructed  types  including  {t\  x  t2)  and  T\  — >■  r2.  The  monotypes  are  distinct 
from  types  that  contain  a  quantifier  (V).  Polytvpes,  also  referred  to  as  type  schemes, 
are  either  monotypes  or  prenex,  universally-quantified  monotypes.  Type  variables  range 
over  monotypes.  Thus,  polytypes  are  forbidden  from  instantiating  a  type  variable. 

Values  consist  of  variables  (x),  integer  and  floating  point  literals  (i  and  /),  unit  (()), 
pairs  of  values,  term  functions  (A  x:r.e),  and  type  functions  (Afi,  •  •  • ,  tn.e).  Term  and  type 
functions  are  sometimes  referred  to  as  term  or  type  abstractions,  respectively.  Expres¬ 
sions  contain  variables,  literals,  unit,  pairs  of  expressions,  term  functions,  a  structural 
equality  operation  (eq(el5  e:2)),  a  test  for  zero,  projections  (7 r*  e)?  and  term  applications 
(<  1  e2). 

Expressions  also  include  a  def  construct,  which  binds  a  variable  to  a  value.  Since 
values  include  type  abstractions,  this  provides  a  means  for  binding  a  type  abstraction  to 
a  variable,  much  like  the  let  construct  of  SML.  Finally,  expressions  include  applications 
of  values  to  monotypes  (u[ti,  •  •  • ,  r„]).  The  typing  rules,  explained  in  Section  2.2  restrict 
v  to  be  either  a  def-bound  variable  or  a  A-abstraction.  Hence,  the  only  thing  that  can 
be  done  with  a  A-abstraction  is  either  bind  it  to  a  variable  via  def  or  apply  it  to  some 
monotypes.  This  means  that,  as  in  SML,  type  abstractions  are  “second-class”  because 
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(types)  r  ::=  t  |  int  |  float  |  unit  |  (rx  x  r2)  |  iq  — >  t2 

(schemes)  a  ::=  r  \  \/t1:  ■  ■  ■  ,tn.r 

(values)  v  ::=  x  \  i  \  f  |  ()  |  (jq .  r2)  |  Ax:T.e  \  At\,  •  •  •  ,tn.e 


(expressions)  e  ::=  x  \  i  \  f  \  X x:r.e  \  e\  e2  |  ()  |  (el5e2)  |  7T!  e  |  tt2  e  | 

eq(e1?e2)  |  ifO  e1  then  e2  elsee3  |  def  x:o  =  v  in  e2 
v[rll---,rn] 


Figure  2.1:  Syntax  of  Mini-ML 


they  cannot  be  placed  in  data  structures,  passed  as  arguments  to  functions,  or  returned 
as  the  result  of  a  function. 

I  use  def  instead  of  let  because  I  reserve  let  as  an  abbreviation.  In  particular,  I  use 
let  out  =  ei  in  e2  as  an  abbreviation  for  (A x:t.  e2)  e1. 

Following  conventional  formulations  of  A-calculus  based  languages  [15],  I  consider  the 
variable  x  in  Xx:r.  e  to  be  bound  within  the  body  of  the  function  e.  Likewise,  I  consider  x 
to  be  bound  within  e  in  the  expression  def  x :o  =  v  in  e,  and  the  type  variables  t\,  •  •  • ,  tn 
to  be  bound  within  the  body  of  the  expression  At\ ,  •  •  • ,  tn.e.  Likewise,  ti,  •  •  • ,  tn  are  bound 
within  r  for  the  polytvpe  Vti,  •  •  •  ,tn.r.  If  a  variable  is  not  bound  in  an  expression/type, 
it  is  said  to  be  free.  I  consider  expressions/types  to  be  equivalent  modulo  ct-conversion 
(i.e. ,  systematic  renaming)  of  the  bound  variables. 

Finally,  I  write  {e'/;r}e  to  denote  capture  avoiding  substitution  of  the  closed  ex¬ 
pression  e'  for  the  variable  x  in  the  expression  e.  Likewise,  I  write  {r/f}cr  to  denote 
capture-avoiding  substitution  of  the  monotype  r  for  t  within  the  polytype  a. 


2.1  Dynamic  Semantics  of  Mini-ML 

I  describe  evaluation  of  Mini-ML  programs  using  a  contextual  rewriting  semantics  in  the 
style  of  Felleisen  and  Hieb  [41].  This  kind  of  semantics  describes  evaluation  as  an  abstract 
machine  whose  states  are  expressions  and  whose  steps  are  functions,  or  more  generally, 
relations  between  expressions.  The  final  state  of  this  abstract  machine  is  a  closed  value. 
Each  step  of  the  abstract  machine  proceeds  according  to  a  simple  algorithm:  We  break 
the  current  expression  into  an  evaluation  context,  E ,  and  an  instruction  expression,  I. 
The  evaluation  context  is  an  expression  with  a  “hole”  ([  ])  in  the  place  of  some  sub¬ 
expression.  The  original  expression,  e,  is  formed  by  replacing  the  hole  in  the  context  with 
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(contexts)  E  ::=  []\  E1e2  \  v1  E2  \  (E1,e2)  \  {vx,E2)  |  7q  E  \ 

eq (Ei,e2)  |  eq (vi,  E2)  |  ifO  Ei  then  e2  else  e3 

(instructions)  /  ::=  eq(r>i,U2)  |  7r?:  (vi,  v2)  |  ifO  i  then  e2  elsee3  |  (A x:r.e)v 

def  x:a  =  v  in  e  |  (Ati,  •  •  • ,  tn.e )  [r1:  ■  ■  ■ ,  rn\ 


Figure  2.2:  Contexts  and  Instructions  of  Mini-ML 


the  instruction  expression,  denoted  e  =  E[I].  Roughly  speaking,  the  evaluation  context 
corresponds  to  the  control  state  (or  “stack” )  of  a  conventional  computer  whereas  the 
instruction  corresponds  to  the  registers  and  program  counter1 .  We  replace  the  instruction 
expression  with  a  result  expression  R  within  the  hole  of  the  context  to  form  a  new 
expression  e'  =  E[R].  This  new  expression  serves  as  the  next  expression  to  process  in 
the  evaluation  sequence. 

The  expression  contexts  and  instructions  of  Mini-ML  are  given  in  Figure  2.2  and 
the  rewriting  rules  are  given  in  Figure  2.3.  The  form  of  the  evaluation  contexts  reflects 
the  fact  that  Mini-ML  evaluates  expressions  in  a  left-to-right,  inner-most  to  outer-most 
order.  Furthermore,  the  context  v\  E2  shows  that  Mini-ML  is  an  eager  (as  opposed  to 
lazy )  language  with  respect  to  function  application,  because  evaluation  proceeds  on  the 
argument  before  applying  the  function  to  it.  Similarly,  the  contexts  for  data  structures, 
namely  pairs,  show  that  these  data  structures  are  eager  with  respect  to  their  components. 

A  def  instruction  is  evaluated  by  substituting  the  value  v  for  all  occurrences  of  the 
variable  x  within  the  scope  of  the  def,  e.  Application  of  a  A-expression  to  a  set  of 
monotypes  is  evaluated  by  substituting  the  monotypes  for  the  bound  type  variables 
within  the  body  of  the  abstraction. 

Rewriting  a  primitive  instruction  is  fairly  straightforward  with  the  exception  of  the 
equality  operation.  In  particular,  the  rewriting  rule  for  equality  must  select  the  appro¬ 
priate  function  (e.g.,  =jnt  versus  =f|oa^)  according  to  the  syntactic  class  of  the  values 
given  to  the  operation  as  arguments. 

Evaluation  does  not  proceed  into  the  branches  of  an  if  0  construct.  Instead,  the  first 
component  is  evaluated  and  then  one  of  the  arms  is  selected  according  to  the  resulting 
value.  When  rewriting  an  application,  (A x:t.  e )  v,  we  first  substitute  the  value  v  for  the 
free  occurrences  of  the  variable  x  within  the  body  of  the  function  e.  Likewise,  when 
rewriting  a  def  construct,  we  substitute  the  value  v  for  the  variable  x  within  the  body 
of  the  def. 

1The  relationship  between  contexts  and  instructions,  and  a  stack  and  registers  is  made  explicit  in 
Chapter  7. 
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E[e q(*i,i2)]  i — »  E[l] 

^[eq(*i, *2)j  1 — »  £[0] 

E[eq(/i,/2)]^E[l] 

£[eq(/1,/2)]>— >£[0] 

F?[eq((r>i,  w2),  (^1^2))]  1 — t  i?[if0  eq(wi,  «/i)  then  0  else  eq(t>2,  ^4)] 
i?[eq(A:r1:r1.  el5  Xx2:r2.  e2)\  1 — >  £7[0] 
i?[if0  0  then  e2  elsee3]  1 — »  i?[e2] 
if[if0  z  then  e2  elsee3]  1 — *  i?[e3] 

^[tt*  (D,  ^2)]  1 - » 

^[(Ax  :  r.  e)  x]  1 — »  l?[{w/x}e] 
i?[def  x:ct  =  v  in  e]  1 — >  E[{v/x}e] 

E[{Ati,  •  •  • ,  i„.e)  [ti,  •  •  • ,  r„J]  1 — >  ■  ■  ,rn/tn}e] 


(zi  — int  *2) 
7^int  *2) 
(/1  =float  A) 
(/1  /float  M 


(i#0) 
(.  =  1.2) 


Figure  2.3:  Contextual  Dynamic  Semantics  for  Mini-ML 


Formally,  I  consider  the  rewriting  rules  to  be  relations  between  programs.  I  consider 
evaluation  to  be  the  least  relation  formed  by  taking  the  reflexive,  transitive  closure  of 
these  rules,  denoted  by  1 — >*.  I  define  e  JJ.  v  to  mean  that  e  1 — C  v.  and  e  ft  to  mean  that 
there  exists  an  infinite  sequence,  e  1 — >  e\  1 — >  e2  1 — > 


2.2  Static  Semantics  of  Mini-ML 

I  formulate  the  static  semantics  for  Mini-ML  as  a  deductive  system  allowing  us  to  derive 
judgments  of  the  form  A;  F  h  e  :  r  and  A;  F  h  v  :  a.  The  first  judgment  means  that 
under  the  assumptions  of  A  and  F,  the  expression  e  can  be  assigned  the  monotype  r. 
Similarly,  the  second  judgment  asserts  that  the  value  v  can  be  assigned  the  type  scheme 
a  under  the  assumptions  of  A  and  F.  Both  judgments’  assumptions  include  a  set  of  type 
variables  (A)  and  a  type  assignment  (F).  The  type  assignment  maps  term  variables  to 
type  schemes,  written  •  •  • ,  xn:an}.  At  most  one  type  scheme  is  assigned  to  any 

variable  in  an  assignment.  Therefore,  we  can  think  of  I  as  a  partial  function  that  maps 
variables  to  types.  I  use  the  notation  F  l±)  {x:r}  to  denote  the  type  assignment  obtained 
by  extending  F  so  that  it  maps  x  to  r,  under  the  requirement  that  x  does  not  already 
occur  in  the  domain  of  T. 

I  assume  that  the  free  type  variables  of  the  range  of  F,  the  free  type  variables  of  e 
and  v,  and  the  free  type  variables  of  r  are  contained  in  A.  Flence,  A  tracks  the  set  of 
type  variables  that  are  in  scope  for  the  expression,  value,  or  type.  Similarly,  the  domain 
of  F  contains  the  set  of  free  variables  of  e  and  v  and  thus  tracks  the  set  of  term  variables 
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that  are  in  scope  for  the  expression  or  value.  I  write  Aha  and  A  h  F  to  assert  that  a 
and  F  are  well-formed  with  respect  to  A. 

The  axioms  and  inference  rules  that  allow  us  to  derive  these  judgments  are  given  in 
Figure  2.4.  Most  of  the  rules  are  standard,  but  a  few  deserve  some  explanation:  If  the 
assumptions  map  x  to  the  scheme  a,  then  we  can  conclude  that  x  has  the  type  a.  Note 
that  x  can  be  viewed  as  a  value  or  an  expression  and  a  could  be  a  monotype  (r). 

The  eq  rule  requires  that  both  arguments  have  the  same  type.  For  now,  I  make 
no  restriction  to  “equality  types”  (i.e. ,  types  not  containing  an  arrow)  as  in  SML2.  The 
dynamic  semantics  simply  maps  equality  of  two  functional  values  to  0. 

The  most  interesting  rules  are  def,  tapp,  and  tabs.  The  def  rule  allows  us  to 
bind  a  polymorphic  value  v  to  some  variable  within  a  closed  scope  e.  If  we  assign  the 
value  a  quantified  type,  then  we  can  only  use  the  variable  in  another  def  binding  or  type 
application.  Consequently  polymorphic  objects  are  “second-class”.  The  tapp  rule  allows 
us  to  instantiate  a  polymorphic  value  of  type  VC,  •  •  •  ,tn.r,  with  types  iq,  •  •  • ,  rn .  The 
resulting  expression  has  the  monotype  formed  by  replacing  t,  with  r,;  in  r.  Finally,  the 
tabs  rule  assigns  the  scheme  VC,  •  •  • ,  tn.r  to  the  type  abstraction  AC,  •  •  • ,  tn.e  if,  adding 
C,---,Ci  t°  the  assumptions  in  A,  we  can  conclude  that  e  has  type  r.  Note  that  the 
notation  A  l±l  {C,  •  •  • ,  tn}  precludes  the  C  from  occurring  in  A. 

I  write  h  e  :  a  if  0;  0  b  e  :  a  is  derivable  from  these  axioms  and  inference  rules.  The 
following  lemmas  summarize  the  key  properties  of  the  static  semantics  for  Mini-ML. 

Lemma  2.2.1  (Type  Substitution)  If  A  hi  {f};  T  b  e  :  o  and  A  h  r,  then 
A;{r/t}(r)  b  (r/t}(e)  :  {r/t}{o). 

Proof  (sketch):  By  induction  on  the  derivation  of  A  hi  {t};  F  b  e  :  a.  □ 

Lemma  2.2.2  (Term  Substitution)  If  A;  F  l±l  {; x:a '}  b  e  :  a  and  A;  T  be':  o' ,  then 
A;  F  b  \d jx\e  :  a. 

Proof  (sketch):  By  induction  on  A;  T  l+l  {:r:a'}  b  e  :  a.  Simply  replace  all  occurrences 
of  the  var  rule  used  to  assign  x  the  type  o'  with  the  derivation  of  A;  T  b  e'  :  o' .  □ 

Lemma  2.2.3  (Canonical  Forms)  Suppose  b  e  :  o.  Then  if  o  is: 

•  int,  then  v  is  some  integer  i. 

•  float,  then  v  is  some  floating-point  value  f . 

•  unit,  then  v  is  (). 

2I  address  the  issue  of  equality  types  in  Section  5.4.2. 
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(var)  A;  F  l±l  {.r:cr}  b  x  :  a 


(int)  A;  T  b  i  :  int 


(float)  A;  T  b  /  :  float 


A;  T  b  ei  :  r  A;  T  b  e2  :  r 
^  A;  T  b  eq(ei,  e2)  :  int 

A;  F  b  ei  :  int  A;  F  b  e2  :  r  A;  F  b  e3  :  r 
A;  F  b  if  0  e1  then  e2  else  e3  :  r 


(unit)  A;  T  b  ()  :  unit 


A;  T  b  et  :  n  A;  T  b  e2  :  r2 
Pair  A;  T  b  (e1,  e2)  :  (n  x  r2) 


A;  T  b  e  :  (q  x  r2) 

(proj)  -  (*  =  1,2 

A;  T  b  7Tj  e  :  r* 


(abs) 


A;  F  1+)  {x\Ti}  b  e  :  r2 
A;  T  b  A x:ti.  e  :  r-v  — )■  r2 


(app) 


A;  T  b  e-i  :  ri  — >  r2  A;  F  b  e2  :  rx 
A;  T  b  ex  e2  :  r2 


(def) 


(tapp) 


(tabs) 


A;  F  b  v  :  a  A;  T  !±)  b  e  :  t 
A;  T  b  def  x:a  =  v  in  e  :  r 

Abq  •  •  •  Abr» 

A;  F  b  v  :  Vfi,  •  •  ■  ,tn.r 
A  b  u[ti,  •  •  •  ,Tn\  :  {n/ti,  ■  ■  ■  ,Tn/tn}r 

A  l±l  {tu  •  •■An}]  T  b  e  :  r 
A;  F  b  A  tVi  ■  ■  -,tn-e  :  Vti,  •  •  •  ,t„.r 


Figure  2.4:  Static  Semantics  for  Mini-ML 
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•  (iq  x  r2),  then  v  is  (tq,  v2),  for  some  rq  and  v2. 

•  T\  — >  t2,  then  v  is  Xx : T\ .  e,  for  some  x  and  e. 

•  Vtq,  •  •  • ,  tn.r,  then  v  is  Ati,  •  •  • ,  tn.e,  for  some  e. 


Proof:  By  an  examination  of  the  typing  rules.  □ 

Lemma  2.2.4  (Unique  Decomposition)  If  he:  a,  then  either  e  is  a  value  v  or  else 
there  exists  a  unique  E,  e' ,  and  a'  such  that  e  =  E[e']  and  h  e'  :  a' .  Furthermore,  for  all 
e"  such  that  h  e"  :  a' ,  h  E[e"]  :  a. 

Proof  (sketch):  By  induction  on  the  derivation  of  h  e  :  a.  Suppose  e  is  not  a  value. 
There  are  seven  cases  to  consider.  I  give  one  of  the  cases  here.  The  other  cases  follow  in 
a  similar  fashion. 

case:  e  is  e1  e2  for  some  e\  and  e2.  Then  a  derivation  of  h  e  :  o  must  end  with  a  use 
of  the  app  rule.  Hence,  there  exists  rq  and  r2  such  that  h  e1  :  iq  — >■  r2,  h  e2  :  T\  and 
a  =  r2.  By  induction,  either  e\  is  a  value  or  else  there  exists  unique  E\ ,  e\ ,  and  a[  such 
that  ei  =  E |  [ft', ]  and  h  e\  :  <j[.  If  ei  is  not  a  value,  then  we  take  E  =  E\  e2,  e'  =  e\ ,  and 
o’  =  a[.  Otherwise,  e\  =  Vi  for  some  iq.  By  induction,  either  e2  is  a  value  or  else  there 
exists  unique  E2,  e'2  and  <j'2  such  that  e2  =  E2[e2\  and  h  e2  :  o2.  If  e2  is  not  a  value,  then 
E  =  V\  E2,  e'  =  e2,  and  a'  =  a2.  Otherwise,  e2  =  v2  for  some  v2.  Thus,  E  =  [],  e'  =  tq  v2 
and  a'  =  o.  □ 


Lemma  2.2.5  (Preservation)  If\~e:a  and  e  t — >  e',  then  h  e'  :  a. 

Proof:  By  Unique  Decomposition  and  the  fact  that  e  i — ^  er,  there  exists  a  unique  E, 

/,  and  e"  such  that  e  =  E[I],  I  i — >  e",  and  e'  =  E[e"].  Furthermore,  there  exists  a  a' 
such  that  h  /  :  o’  and  for  all  e'"  such  that  h  e'"  :  a',  h  E[e"']  :  a.  Hence,  it  suffices  to 

show  that  regardless  of  I  and  e" ,  h  e"  :  a' .  There  are  six  cases  to  consider: 

case:  If  I  =  eq(rq,  v2)  then  a  derivation  of  h  I  \  o'  must  end  with  an  application  of  the 
eq  rule.  Hence,  o'  =  int  and  there  exists  a  r  such  that  h  rq  :  r  and  r  v2  :  r.  If  rq  and 
v2  are  integers,  floats,  or  A-abstractions,  then  /  i — )•  i  for  some  i  and  h  i  :  a'.  If  iq  and 
v2  are  pairs,  (va,Vb)  and  (v'a,v'b),  then  /  i — »  ifO  eqfva,v'a)  then  0  else  eq(r>6,  v'b).  By 
examination  of  the  typing  rules,  r  must  be  of  the  form  (ra  x  rf).  Since  h  tq  :  (t„  x  rf) 
and  h  v2  :  (ra  x  rh),  derivations  of  these  facts  must  end  with  a  use  of  the  pair  rule. 

Hence,  h  va  :  rQ,  h  vb  :  r6,  h  v'a  :  ra,  and  h  v'b  :  rb.  By  the  eq  rule,  h  eq(ua,t/)  :  int 

and  h  eq(vb,v'b)  :  int.  Thus,  by  the  ifO  rule,  h  ifO  eq(ua,u(t)  then  0  else  eq(u6,  v'b)  :  int. 
Hence,  h  e"  :  a'. 
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case:  If  I  =  ifO  i  then  e,\  elsee2,  then  a  derivation  of  b  /  :  o'  must  end  with  an 
application  of  the  ifO  rule.  Hence,  there  exists  a  r  such  that  b  e1  :  t  and  b  e2  :  r  and 
o'  =  t.  If  i  is  0,  then  I  i — >  e\  else  /  i — >  e2.  Regardless,  the  resulting  expression  has 
type  o'. 

case:  If  /  =  tt,:  v  for  i  =  1,2,  then  a  derivation  of  b  I  :  o'  must  end  with  an  application 
of  the  proj  rule.  Hence,  there  exists  iq  and  t2  such  that  b  v  :  (iq  x  t2)  and  o'  =  t,. 
By  Canonical  Forms,  v  must  be  of  the  form  (rq,u2)  and  /  i — >  n; .  A  derivation  of 
b  (tq,  v2)  :  (n  x  r2)  must  end  with  the  pair  rule,  hence  b  v,  :  T{  and  b  e"  :  o'. 
case:  If  I  =  (Ax:ri.  e±)  v  then  e"  =  { r/.r  }<q .  A  derivation  of  b  I  :  o'  must  end  with  a 
use  of  the  app  rule.  Hence,  o'  =  r2  for  some  r2  and  b  Xx\T\  .  C|  :  iq  -q  t2  and  b  v  :  iq. 
By  Term  Substitution,  b  {v/x}e\  :  r2.  Hence,  b  e"  :  a' . 

case:  If  I  =  (A t\,  •  •  • ,  tn.e i)  [ri,  •  •  • ,  rn\  then  e"  =  {ri/ti,  •  •  • ,  rn/tn}e i.  A  derivation  of 

b  /  :  a'  must  end  with  a  use  of  the  tapp  rule.  Hence,  o’  =  {ti/H,  •  •  • ,  Tn/tn}r  for  some  r 
such  that  b  Ati,  •  •  •  ,tn.e i  :  Vti,  •  •  • , tn.r.  By  Type  Substitution,  b  {ri/ti, ...*•• , rn/tn}e-i  : 
{n/ti,  •  •  • ,  rn/tn}r.  Thus,  b  e"  :  </.  □ 

Lemma  2.2.6  (Progress)  If  be:  a,  then  either  e  is  a  value  or  else  there  exists  some 
e'  such  that  e  i — >  e'. 

Proof:  If  e  is  not  a  value,  then  by  Unique  Decomposition,  there  exists  an  E,  el5  and  a' 

such  that  e  =  E[e i]  and  b  e%  :  o' .  I  argue  that  e\  must  be  an  instruction  and  hence,  there 

is  an  e2  such  that  e\  i — >  e2  and  thus  E[e i]  i — >  E[e2].  There  are  five  cases  to  consider, 
where  e%  could  possibly  be  stuck. 

case:  If  e\  is  of  the  form  ea  ei,-  then  ea  and  must  both  be  values,  else  by  Unique 
Decomposition,  e\  can  be  broken  into  a  nested  evaluation  context  and  expression.  Thus, 
ei  =  v\  v2  for  some  iq  and  v2.  Since  b  v\  v2  :  o\  the  derivation  must  end  in  an  application 
of  the  app  rule.  Thus,  there  exists  a  r1  and  r  such  that  b  rq  :  r7  — >  r  and  b  v2  :  r1  and 
o'  =  t.  Since  b  ?q  :  r'  — >■  r,  V\  is  closed,  by  Canonical  Forms,  v1  must  be  of  the  form 
A x:r'.  e"  for  some  x  and  e" .  Thus,  i — >  {v2/x}e". 

case:  If  e1  is  of  the  form  tt.,  e \  for  i  =  1,2,  then  e\  must  be  a  value  v\,  else  by  Unique 
Decomposition,  e1  can  be  broken  into  a  nested  evaluation  context  and  expression.  Thus, 
e\  =  v  for  some  v.  Since  b  7 TjV  :  o' ,  by  an  examination  of  the  typing  rules,  a  derivation 
of  this  fact  must  end  with  a  use  of  the  proj  rule.  Hence,  b  v  :  T\  x  t2  for  some  iq  and 
r2  such  that  t,  =  o' .  By  Canonical  Forms,  there  exists  two  values  ?q  and  v2  such  that 
v  =  (vi,v2).  Hence,  e1  i — >  u*. 

case:  If  e\  is  of  the  form  eq(ea,eb)  then  ea  and  must  be  values,  V\  and  v2.  Since 
b  e\  :  o' ,  a  derivation  of  this  fact  must  end  with  the  eq  rule.  Hence,  o'  is  int  and  there 
exists  a  r  such  that  b  ?q r  and  b  v2  :  r.  By  Canonical  Forms,  iq  and  v2  are  both  either 
integers,  floats,  pairs,  or  functions.  Hence,  i — >  i  for  some  i. 
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case:  If  e\  is  of  the  form  if  0  ea  then  e j  else  ec,  then  ea  must  be  a  value  v.  Since 
h  e1  :  a ',  a  derivation  of  this  fact  must  end  with  a  use  of  the  ifO  rule.  Hence,  b  v  :  int. 
By  canonical  forms,  v  is  some  integer  i.  If  i  is  0,  then  e\  \ — >  else  %  i — >  ec. 

case:  If  e\  is  of  the  form  v  [ri,  •  •  • ,  rn],  then  since  b  ex  :  cr1,  a  derivation  of  this  fact 
must  end  with  a  use  of  the  tapp  rule.  Hence,  there  exists  some  Vti,  •  •  • ,  tn.r  such  that 
b  v  :  Vti,  •  •  •  ,,tn.r,  where  a1  =  {ti/H,  •  •  • ,  rn/tn}r.  By  Canonical  Forms,  v  must  be 
At | ,  •  •  • ,  tn.e"  for  some  e" .  Hence,  e1  i — >  {r| /t\ ,  •  •  • ,  rn/tn}e".  □ 

This  last  lemma  implies  that  well-typed  Mini-ML  expressions  cannot  “get  stuck” 
during  evaluation.  From  a  practical  standpoint,  this  means  that  it  is  impossible  to  have 
a  well-typed  program  that  attempts  to  apply  a  non-function  to  some  arguments,  or  to 
project  a  component  from  a  non-tuple.  Therefore,  any  implementation  that  accurately 
reflects  the  dynamic  semantics  will  never  “dump  core”  when  given  a  well-typed  program. 

Corollary  2.2.7  (Soundness)  If  b  e  :  a,  then  either  e  f f  or  else  there  exists  some  v 
such  that  e  If  v  and  b  v  :  a. 

Proof:  By  induction  on  the  number  of  rewriting  steps,  if  e  i — C  e1,  then  by  Preser¬ 

vation,  b  e'  :  a  and  by  Progress,  either  e'  is  a  value  or  else  there  exists  an  e"  such  that 
e'  i — >  e" .  Therefore,  either  there  exists  an  infinite  sequence,  e  \ — b  e'  \ — >  e1  i — >  e2  i — > 
■  ■  •,  or  else  e  if  v  and  b  v  :  a.  □ 


Chapter  3 

A  Calculus  of  Dynamic  Type 
Dispatch 


I  argued  in  Chapter  1  that  compiling  a  polymorphic  language  without  sacrificing  con¬ 
trol  over  data  representations  or  the  ability  to  compile  modules  separately  requires  an 
intermediate  language  that  supports  dynamic  type  dispatch.  In  this  chapter,  I  present 
a  core  calculus  called  Xf!L  that  provides  a  formal  foundation  for  dynamic  type  dispatch. 
In  subsequent  chapters,  I  derive  intermediate  languages  based  on  this  formal  calculus 
and  show  how  to  compile  Mini-ML  to  these  lower-level  languages,  taking  advantage  of 
dynamic  type  dispatch  to  implement  various  language  features. 

3.1  Syntax  of 

XfL  is  based  on  XML  [94],  a  predicative  variant  of  the  Girard-Reynolds  polymorphic  cal¬ 
culus,  Fw  [47,  46,  106].  The  essential  departure  from  the  impredicative  systems  of  Girard 
and  Reynolds  is  that,  as  in  Mini-ML,  there  is  a  distinction  made  between  monotypes 
(types  without  a  quantifier)  and  polytypes,  and  type  variables  are  only  allowed  to  range 
over  monotypes.  Such  a  calculus  is  more  than  sufficient  for  the  interpretation  of  ML-style 
polymorphism1  and  makes  arguments  based  on  logical  relations  easier  than  an  impred¬ 
icative  calculus.  The  language  XfL  extends  XML  with  intensional  (or  structural  [52]) 
polymorphism,  which  allows  non-parametric  functions  to  be  defined  via  intensional  anal¬ 
ysis  of  types. 

The  four  syntactic  classes  of  XfL  are  given  in  Figure  3.1.  The  expressions  of  the 
language  are  described  by  types.  Types  include  int,  function  types,  explicitly  injected 
constructors  (T(//))  and  polymorphic  types  (Vtr.n.a).  Types  that  do  not  include  a  quanti¬ 
fier  are  called  monotypes,  whereas  types  that  do  include  a  quantifier  are  called  polytypes. 

1See  Harper  and  Mitchell  [94]  for  further  discussion  of  this  point. 
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The  language  easily  extends  to  float,  products,  and  inductively  generated  types  like  lists; 
I  omit  these  here  to  simplify  the  formal  treatment  of  the  calculus. 

The  constructors  of  \fL  form  a  language  that  is  isomorphic  to  a  simply-typed  A- 
calculus  extended  with  a  single,  inductively  defined  base  type  (such  as  lists  or  trees)  and 
an  induction  elimination  form  (such  as  fold).  In  this  case,  the  inductively  defined  base 
type  is  given  by  the  set  of  constructor  values  which  are  generated  as  follows: 

t  ::=  Int  |  Arrow(ri,  r2). 

Each  constructor  value  r  names  a  monotype  a.  In  particular,  Int  is  a  constructor  value 
that  names  the  type  int.  If  T\  names  the  type  and  r2  names  the  type  cr2,  then 
Arrow(rx,r2)  names  the  type  ay  — >  a2. 

To  distinguish  expression-level  types  from  constructor-level  types,  I  call  the  latter 
kinds  (n).  Closed  constructors  of  kind  0  compute  constructor  values.  If  /j,  computes  the 
constructor  value  r,  and  r  names  the  monotype  a,  then  I  use  the  explicit  injection  T\fj) 
to  denote  the  monotype  a.  The  precise  relationship  between  constructors  and  monotypes 
is  axiomatized  in  Section  3.3. 

As  in  standard  polymorphic  calculi,  constructor  abstractions  (A t\\K.e)  let  us  define 
functions  from  constructors  to  terms.  Unlike  languages  based  on  the  Hindley-Milner  type 
system  including  Mini-ML,  SML,  and  Haskell,  I  do  not  restrict  constructor  abstractions 
to  a  “second-class”  status.  This  is  reflected  in  the  types  of  the  language,  because  there 
is  no  prenex-quantifier  restriction.  Hence,  constructor  abstractions  can  be  placed  in  data 
structures,  passed  as  arguments,  or  returned  from  functions. 

The  Typerec  and  typerec  forms  give  us  the  ability  to  define  both  constructors  and 
terms  by  structural  induction  on  monotypes.  The  Typerec  and  typerec  forms  may  be 
thought  of  as  eliminatory  forms  for  the  kind  f }  at  the  constructor  and  term  level  respec¬ 
tively.  The  introductory  forms  are  the  constructors  of  kind  Q;  there  are  no  introductory 
forms  at  the  term  level  in  order  to  preserve  the  phase  distinction  [25,  60].  In  effect,  Typerec 
and  typerec  let  us  fold  some  computation  over  a  monotype.  Limiting  the  computation 
to  a  fold,  instead  of  some  general  recursion,  ensures  that  the  computation  terminates  — 
a  crucial  property  at  the  constructor  level.  However,  many  useful  operations,  including 
pattern  matching,  iterators,  maps,  and  reductions  can  be  coded  using  folds. 

I  consider  A  to  bind  the  type  variable  t  within  a  constructor  function,  A t::n.  e.  I  also 
consider  \/t::n.a  to  bind  t  within  the  scope  of  a.  I  consider  A  to  bind  the  expression 
variable  x  within  an  expression  function,  A x:a.  e.  I  consider  A  to  bind  the  type  variable 
t  within  a  constructor  abstraction  A tv.n.e.  Finally,  I  consider  t  to  be  bound  in  a  for  the 
“[t.a]”  portion  of  a  typerec  expression.  This  type  scheme  on  typerecs  is  needed  to  make 
the  language  explicitly  typed.  As  usual,  I  consider  constructors,  types,  and  expressions 
to  be  equivalent  modulo  a-conversion  of  bound  variables. 
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(kinds) 

K 

!! —  — V  Kj2 

(constructors) 

A 

::=  t  Int  Arrow(yu,i, /tt2)  A twn.  fi  /Ji  /i2 

Typerec  /i  of  //arrow) 

(types) 

a 

::=  T(/i)  int  0\  — >■  a2  X/twu.cr 

(expressions) 

e 

::=  x  |*|  A x:cr.e  A twn.e  \  e1e2  e[/j] 
typerec  n  of  [Ca](eint;  earrow) 

Figure  3.1:  Syntax  of  XfL 

(values) 

u 

::=  Int  Arrow(u,u)  A twn.  n 

(contexts) 

U 

::=  []  |  U  n  |  uU  \  Typerec  U  of  (Aint!  Harrow) 

(instructions) 

J 

::=  (Xt::K.  hi)  u  Typerec  Int  of  (AintJ  Aarrow) 

Typerec  Arrow(«i,  u2)  of  (Ainti  Aarrow) 

Figure  3.2:  Values,  Contexts,  and  Instructions  of  Constructors 

3.2  Dynamic  Semantics  of  \f'^ 

The  dynamic  semantics  for  XfL  consists  of  a  set  of  rewriting  rules  for  both  constructors 
and  expressions.  I  use  a  contextual  semantics  to  describe  evaluation  at  both  levels. 

The  values,  evaluation  contexts,  and  instructions  for  constructors  are  given  in  Fig¬ 
ure  3.2.  I  choose  to  evaluate  constructors  in  a  call-by-value  fashion,  though  either  call- 
bv-name  or  call-by-need  would  also  be  appropriate.  Therefore,  the  values  of  constructors 
consist  of  variables,  functions,  Int,  or  Arrow  constructors  with  value  components.  The 
evaluation  contexts  of  constructors  consist  of  a  hole,  an  application  with  a  hole  some¬ 
where  in  the  function  position,  an  application  of  a  value  to  constructor  with  a  hole  in  the 
argument  position,  or  a  Typerec  with  a  hole  somewhere  in  the  argument.  The  instructions 
consist  of  an  application  of  a  function  to  a  value,  or  a  Typerec  where  the  the  argument 
constructor  is  either  Int  or  Arrow. 

The  rewriting  rules  for  constructors  are  given  in  Figure  3.3.  The  rule  for  function 
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U[(M::k.  Hi)u2]  ' — >  U[{u2/t}fj i] 

[/[Typerec  Int  of  (/v,int; /v,arrow)]  i — >  U[/iin t] 

U[ Typerec  Arrow (uuu2)  of  (//int;  /Wow)]  1 — > 

U\/i arrow'll  ^2  (Typerec  of  (/i;n t.  Aarrow))  (Typerec  tt2  of  (/tint;  Aarrow) )] 

Figure  3.3:  Rewriting  Rules  for  Constructors 


application  is  straightforward.  The  rules  for  Typerec  select  the  appropriate  clause  ac¬ 
cording  to  the  head  component  of  the  argument  constructor.  Thus,  /iint  is  chosen  if  the 
argument  is  Int,  while  //arrow  is  chosen  if  the  argument  is  Arrow(ui,  u2).  If  the  argument 
constructor  has  components,  then  we  pass  these  components  as  arguments  to  the  clause. 
We  also  pass  the  the  “unrolling”  of  the  Typerec  on  these  components.  For  instance,  if 
the  argument  constructor  is  Arrow(ul5  u2),  then  we  pass  ui,  u2,  and  the  same  Typerec 
applied  to  u\  and  u2  to  the  //arrow  clause.  In  this  fashion,  the  Typerec  is  folded  across  the 
components  of  a  constructor. 

The  values,  evaluation  contexts,  and  instructions  for  the  expressions  of  A f*L  are  given 
in  Figure  3.4.  The  evaluation  contexts  and  values  show  that  XfL  expressions  are  evaluated 
in  a  standard  call-by-value  fashion.  As  at  the  constructor  level,  I  choose  to  evaluate  con¬ 
structor  application  eagerly.  Hence,  a  constructor  is  reduced  to  a  constructor  value  before 
it  is  substituted  for  the  A-bound  type  variable  of  a  constructor  abstraction.  Evaluation 
of  an  expression-level  typerec  is  similar  to  the  evaluation  of  a  constructor-level  Typerec. 
First,  the  argument  is  evaluated  and  then,  the  appropriate  clause,  either  e;nt  or  earrow  is 
chosen  according  to  this  component.  Any  nested  constructor  components  are  passed  as 
constructor  arguments  to  the  clause  as  well  as  the  “unrolling”  of  the  typerec  on  these 
components.  Hence,  evaluation  of  a  typerec  applied  to  the  constructor  Arrow(ui,  u2) 
selects  the  earrow  clause,  passes  it  u  1  and  u2  as  constructor  arguments  and  the  same 
typerec  applied  to  u\  and  u2  as  value  arguments. 
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(values)  v  ::=  i  \  Xx :<j.e  |  A tv.n.e 

(contexts)  E  ::=  []  E\  r,2  \  Vi  E2  \  E[n ] 

(instructions)  I  ::=  (Xx :a.e)v  |  (A t::K.e)[U[J]\  |  (A t::K.e)[u]  \ 

typerec  U[J]  of  [la](eint;  earrow)  | 
typerec  Int  of  [f.cr](eint;  earrow)  | 
typerec  Arrow(ui,  u2)  of  [f.cr](eint; ;  earrow) 


Figure  3.4:  Values,  Contexts,  and  Instructions  of  Expressions 


E[(Xx:a.  e)  v\  i — >  E[{v /x}e\ 

E[(At"K.e)[U[J}}]  i— >•  E[(At:-.K.e)[U[ij]]]  when  U[J]  i— >•  U[n] 

E[(At::n.e)[u]\  i — )•  E[{u/t]e\ 

^[typerec  U[J\  of  [f.cr](eint;  earrow)]  i — > 

^[typerec  U[n\  of  [f.a](eint;  earrow)]  when  U[J]  i — *£/[//] 

E’ftyperec  Int  of  [t.a](eint;  earrow)]  ' — >  E[eint\ 

^[typerec  Arrow(ui,u2)  of  [t.a](eini- earrow)]  i — > 

E  [carrow  [ui\  [«2]  (typerec  ux  of  [t.a](eiat  5  ^ arrow)) 

(typerec  u2  of  [f.cr](eint 

;  Harrow ) )  ] 

Figure  3.5:  Rewriting  Rules  for  Expressions 
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(var)  A  l+i  {t::n}  b  t  ::  n  (int)  A  b  Int :: 

A  b  Hi  ::  Q  A  b  H2  0 


(fn) 


(arrow) 

A  !±)  b  /i  ::  n2 

A  b  A n  ::  — )•  n2 


A  b  Arrow(/Ui,  fi2)  ::  II 


(app) 


A  b  Hi  :  Ki  — >■  k2  A  b  fi2  : 
A  b  Hi  /i 2  :  ^2 


(tree) 


A  b  h  ::  ^  A  b  /iint  ::  k 

A  b  /-/arrow  •  •  -  -  — 2  -  -  — 2  /'/  — /  K  — /  K 
A  b  Typerec  h  of  (^nt;  Harrow)  ••  ^ 


Figure  3.6:  Constructor  Formation 


3.3  Static  Semantics  of  \f'^ 


The  static  semantics  of  XfL  consists  of  a  collection  of  rules  for  deriving  judgments  of  the 
form 


A  b  h  ::  K 

A  b  Hi  =  A 2  ::  « 

Abcr 

A  b  G\  =  o2 
A;  T  b  e  :  a 


H  is  a  constructor  of  kind  k 
Hi  and  H2  are  equivalent  constructors 
a  is  a  valid  type 
/ Ti  and  a2  are  equivalent  types 
e  is  a  term  of  type  a, 


where  A  is  a  kind  assignment,  mapping  type  variables  (f)  to  kinds  (k),  and  T  is  a  type 
assignment,  mapping  term  variables  ( x )  to  types  (a).  These  judgments  may  be  derived 
from  the  axioms  and  inference  rules  of  Figures  3.6,  3.7,  3.8,  3.9,  and  3.10,  respectively. 

Constructor  formation  (see  Figure  3.6)  is  standard  with  the  exception  of  Typerec. 
Here,  I  require  that  the  argument  of  the  Typerec  be  of  kind  Q.  When  evaluating  a  Typerec, 
one  of  the  clauses  is  chosen  according  to  the  value  of  the  argument.  Any  components 
are  passed  as  arguments  to  the  clause  as  well  as  the  unrolling  of  the  Typerec  on  these 
components.  Therefore,  the  whole  constructor  is  assigned  the  kind  k  only  if  Hint  has  kind  k 
(since  the  Int  constructor  has  no  components)  and  /ia rrow  has  kind  O  — ^  0  — >  /s:  — ^  /s:  — ^  ac, 
(since  it  has  two  components). 

To  type  check  an  expression,  we  need  to  be  able  to  tell  when  two  types  are  equivalent. 
Since  constructors  can  be  injected  into  types,  we  need  an  appropriate  notion  of  construc¬ 
tor  equivalence.  Therefore,  I  define  definitional  equivalence  [113,  87]  via  the  judgment 
A  b  (jj  e  /i2  ::  k.  Figure  3.7  gives  the  axioms  and  inference  rules  that  allows  us  to  derive 
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definitional  equivalence.  The  rules  consist  of  3-  and  reconversion,  recursion  equations 
governing  the  Typerec  form,  and  standard  rules  of  equivalence  and  congruence.  In  the 
following  chapter,  I  show  that  every  well-formed  constructor  /j  has  a  unique  normal  form, 
with  respect  to  the  obvious  notion  of  reduction  derived  by  orienting  these  equivalence 
rules  to  the  right.  Furthermore,  I  show  that  this  reduction  relation  is  confluent,  from 
which  it  follows  that  constructor  equivalence  is  decidable  [113]. 

The  type  formation  and  equivalence  rules  can  be  found  in  Figures  3.8  and  3.9  respec¬ 
tively.  The  rules  of  type  equivalence  define  the  interpretation  T(/i)  of  the  constructor 
//  as  a  type.  For  example,  T(lnt)  =  int  and  T(Arrow(/Ui,  /12))  =  T(/i  1)  — >■  T(/x 2)-  Thus, 
T  takes  us  from  a  constructor  that  names  a  type  to  the  actual  type.  The  other  type 
equivalence  rules  make  the  relation  an  equivalence  and  congruence  with  respect  to  the 
type  constructs. 

The  term  formation  rules  may  be  found  in  Figure  3.10.  Term  formation  judgments 
are  of  the  form  A;  T  b  e  :  a.  I  make  the  implicit  assumption  that  all  free  type  variables 
in  a,  e,  and  the  range  of  F  can  be  found  in  the  domain  of  A.  Hence,  A  provides  the  set 
of  type  variables  that  are  in  scope. 

The  term  formation  rules  resemble  the  typing  rules  of  Mini-ML  (see  Figure  2.4) 
with  the  exception  of  the  constructor  abstraction,  application,  typerec  and  equivalence 
rules.  Similar  to  value  abstraction,  constructor  abstraction  adds  a  new  type  variable 
of  the  appropriate  kind  to  the  current  kind  assignment  to  give  a  type  to  the  body  of 
the  abstraction.  Again,  the  “hi”  notation  ensures  that  the  added  variable  does  not 
already  occur  in  A.  For  a  constructor  application,  e[/i],  if  e  is  given  a  polymorphic 
type  \/t::n.<7,  and  / x  has  kind  n  under  the  current  type  and  kind  assignments,  then  the 
resulting  expression  has  the  type  obtained  by  substituting  /i  for  the  free  occurrences  of  t 
within  a.  The  equivalence  rule  ascribes  the  type  a  to  an  expression  of  type  o'  if  a  and 
a1  are  definitionally  equivalent. 

A  typerec  expression  of  the  form  typerec  /r  of  [l(j](ejnt;  earrow)  is  given  the  type 
obtained  by  substituting  the  argument  constructor  /i  for  t  in  the  type  a  if  the  following 
conditions  hold:  First,  p  must  be  a  constructor  of  kind  D,  since  only  these  constructors 
can  be  examined  via  typerec.  Second,  each  of  the  clauses  must  have  a  type  obtained  by 
replacing  the  appropriate  constructor  for  t  within  a.  Furthermore,  earrow  must  abstract 
the  components  of  the  Arrow  constructor  as  well  as  the  result  of  unwinding  the  typerec 
on  these  components. 


3.4  Related  Work 

There  are  two  traditional  interpretations  of  polymorphism,  the  explicit  style  (due  to  Gi¬ 
rard  [47,  46]  and  Reynolds  [106]),  in  which  types  are  passed  to  polymorphic  operations, 
and  the  implicit  style  (due  to  Milner  [89]),  in  which  types  are  erased  prior  to  execution. 
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A  l±J  {t  ::  n'}  h  ii\  ::  n  Ah  m  2  ::  «/ 
A  h  (Xtv.K  .  /ii)  /i2  =  {M'2/i}Ml  ::  K 


A  h  n  ::  Ki  — >  K2 
A  h  Xt::K\.  (f.i  t)  =  /i  :: 


(t  ^  Dom{ A)) 


(trec-int) 


(tree-arrow) 


A  h  Mint  ::  k 

A  h  Marrow  ::  — >•  O  — >■  K  — >•  K  — >  K 

A  h  TypereC  Int  of  (/-tint  5  Marrow)  =  Mint  ••  ^ 

A  h  /ii  ::  O  Ah  /i2  ::  0  Ah  ^int  ::  k 

A  h  //arrow  Ll  ^  U  ^  K  ^  K  ^  K 


A  h  Marrow  Q  ~ >  Ll  — >  K  ^  K  — >  K 
A  h  Typerec  Arrow(^i, M2)  of  (Mint! Marrow)  = 

Marrow  Mi  M2  (Typerec  /ii  of  (Mint  5  Marrow)) 

(Typerec  /t2  of  (Mint  !M arrow))  ••  K 


(symm) 


^  A  h  /i  ::  K 

Ah  /i  =  /i  ::  k 

A  h  //,i  =  //,2  ::  k 
A  h  //,2  =  Ml  ::  K 


(tran) 


A  h  /ii  =  /i2  ::  k  Ah  /i2  =  M3  "  « 


A  h  /Ui  =  M3  "  « 

A  h  =  n\  ::  Q  Ah  /t2  =  M2  ::  ^ 

(arrow)  - - — - - 

A  h  Arrow(//i, //,2)  =  Arrow(/xl5 /i2)  :: 

A  l±)  h  /ii  =  H2  ::  k2 


(tree) 


A  h  Xt::K\.  Mi  =  H2  "  Ki  — >  K2 

A  h  /ii  =  /ij  ::  Ki  — ^  K2  A  h  /X2  =  M2  ::  Kl 
A  h  /ii  /x2  =  Mi  M2  ::  K2 

Ah/iE  n'  ::tt  Ah  Mint  =  Mint  ::  K 
A  h  Marrow  =  M arrow  ••  -  -  ^  ^  ^  1  ^  1  K 


A  h  Typerec  /t  of  (Mint  5  Harrow  )  =  Typerec  //  of  (//•„, 

i  Marrow)  **  ^ 


Figure  3.7:  Constructor  Equivalence 
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A  h  int 


Ah  /j,  ::  Ll 
A  h  T(n) 


A  h  U\  A  h  a2 
A  h  Ox  — >■  cr2 


A  l±)  h  a 
A  h  Vt::K.  a 


Figure  3.8:  Type  Formation 


A  h  T(lnt)  =  int  Ah  T(Arrow(/ii,  //2))  =  T(fi  i)  — »  T(/x2) 

A  h  //  =  jj!  ::  hi 
A  h  T(/i)  =  T(//) 

A  h  (J  =  ch  A  h  (Jj  =  (72  A  h  (72  =  (7,3 

A  h  (7  =  a  — - 7 -  - — - 

A  h  a'  =  a  A  h  o-!  =  (T3 

A  h  <Ti  =  cq  A  h  (J2  =  (J2  A  h  (J\  =  A  h  (T2  =  C2 


A  h  (<Ti  X  (72 )  =  (cq  X  (J2)  A  h  01  — >■  (J2  =  (T^  — >■  (T^ 

A  l±l  {t::n}  h  a  =  a 
A  h  \Jt.\K.a  =  Mt.-.K.a 


Figure  3.9:  Type  Equivalence 
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(fn) 

(tfn) 


(var)  A;  F  i+l  {x\a}  b  x  :  a  (int)  A;  T  b  i  :  int 

A;  T  l±l  {x:ai}  b  e  :  a2  /  x  A;  T  b  e1  :  o\  — >■  <r2  A;  F  b  e2  :  Ci 


A;  F  b  Ax\o\.  e  :  cq  — >  ct2 

A  1±1  T  b  e  :  a 

A;  T  b  A t::K.  e  :  V£::ac.<t 


(app) 

(tapp) 


A;  F  b  ex  e2  :  a2 

A;  T  b  e  :  \/t::n.a  A  b  //  ::  kappa 
A;  T  b  e[/i]  :  {/i/t}a 


(tree) 


A  b  /i  :  Q 

A;  F  b  eint  :  {lnt/f}cr 

A;  F  b  earrow  :  \/ti::QNt2::D.{ti/t}a  — >■  {£2/£}(T  — >■  {Arrow(ti,  t2)/t}c»' 
A;  F  b  typerec  p  of  [f.cr](eint;  earrow)  :  {p/t}a 

A;  F  b  e  :  a'  A  b  a1  =  a  ::  Cl 

(equiv)  — - - 


A;  T  b  e  :  a 


Figure  3.10:  Term  Formation 


In  their  study  of  the  type  theory  of  Standard  ML  [93,  59],  Harper  and  Mitchell  argued 
that  an  explicitly-typed  interpretation  of  ML  polymorphism  has  better  semantic  proper¬ 
ties  and  scales  more  easily  to  cover  a  full  programming  language.  Harper  and  Mitchell 
formulated  a  predicative  type  theory,  XML,  a  theory  of  dependent  types  augmented  with 
a  universe  of  small  types,  adequate  for  capturing  many  aspects  of  SAIL.  This  type  theory 
was  later  refined  by  Harper,  Mitchell,  and  Moggi  [60],  and  provides  the  fundamental  basis 
for  the  type  theory  of  XfL. 

The  idea  of  adding  an  inductively  generated  ground  type  (the  natural  numbers)  with 
an  elimination  rule  like  Typerec,  to  the  typed  A-calculus  is  implicit  in  Godel’s  original 
“functionals  of  finite  type”  [48].  Thus,  the  constructor  language  of  XfL  is  fundamentally 
based  on  this  work.  According  to  Lambek  and  Scott  [79],  Marie-France  Thibault  [117] 
studied  the  correspondence  between  such  calculi  and  cartesian  closed  categories  equipped 
with  “strong”  natural  number  objects.  However,  the  notion  of  constructor  equivalence 
in  XfL  corresponds  to  what  Lambek  and  Scott  term  a  “weak”  natural  number  object. 

The  idea  of  adding  an  inductively  generated  universe ,  with  a  term-level  elimination 
rule  such  as  typerec,  was  derived  from  the  universe  elimination  rules  found  in  NuPrl  [32], 
though  the  idea  was  only  described  in  unpublished  work  of  Robert  Constable.  Harper 
and  I  devised  the  original  formulation  of  A  fL  [62,  61]. 


Chapter  4 


Typing  Properties  of  A^L 


In  this  chapter,  I  present  proofs  of  two  important  properties  of  Xf41:  Type  checking 
XfL  terms  is  decidable,  and  the  type  system  is  sound  with  respect  to  the  operational 
semantics.  Hence,  Xf41  enjoys  many  of  the  same  semantic  properties  as  more  conventional 
typed  calculi. 

Readers  anxious  to  see  how  dynamic  type  dispatch  can  be  used  may  wish  to  skip  this 
chapter  and  come  back  to  it  later. 


4.1  Decidability  of  Type  Checking  for  \f'^ 

If  we  remove  the  equivalence  rule  from  the  term  formation  rules,  then  type  checking  Xf41 
terms  would  be  entirely  syntax-directed.  This  is  due  to  the  type  label  on  A-abstractions 
at  the  expression  level,  the  kind  label  on  A-abstractions  at  the  constructor  level,  and  the 
type  scheme  [t.a\  labelling  typerec  expressions.  But  in  the  presence  of  the  equivalence 
rule,  we  need  an  alternative  method  for  determining  whether  an  expression  may  be 
assigned  a  type,  and  if  so,  what  types  it  may  be  given. 

In  this  section,  I  show  that  every  well-formed  constructor  has  a  unique  normal  form 
with  respect  to  a  certain  notion  of  reduction,  and  that  two  constructors  are  definition- 
ally  equivalent  if  and  only  if  they  have  syntactically  identical  normal  forms,  modulo 
tt-conversion.  From  this  notion  of  normal  forms  for  constructors,  it  is  straightforward 
to  generalize  to  a  normal  form  for  types:  Normalize  any  constructor  components  of  the 
type,  and  recursively  replace  T(lnt)  with  int  and  T(Arrow(/Ui,  /I2))  with  T(/i  1)  — >■  T(n 2). 
From  this,  it  is  easy  to  see  that  two  types  are  definitionally  equivalent  iff  their  normal 
forms  are  syntactically  equivalent,  modulo  o-conversion. 

With  normal  forms  for  types,  we  can  formulate  a  different  proof  system  for  the  well- 
formedness  of  XfL  terms  that  is  entirely  syntax  directed:  At  each  step  in  the  derivation, 
we  replace  the  type  on  the  right-hand  side  of  the  with  its  normal  form.  Given  a 
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procedure  for  determining  normal  forms,  since  the  rest  of  the  rules  are  syntax  directed, 
type  checking  in  the  new  proof  system  is  entirely  syntax  directed.  Furthermore,  it  is 
easy  to  see  that  the  resulting  proof  system  is  equivalent  to  the  original  system.  Any 
proof  in  the  new  system  can  be  transformed  into  a  proof  in  the  old  system  simply  by 
adding  applications  of  the  equivalence  rule  at  each  step,  asserting  that  the  type  ascribed 
by  the  old  system  and  its  normal  form  are  equivalent.  Any  proof  in  the  old  system  can 
be  transformed  into  a  proof  in  the  new  system  since  normal  forms  are  unique. 

Consequently,  if  I  can  establish  normal  forms  for  constructors,  show  that  two  con¬ 
structors  are  definitionally  equivalent  iff  they  have  the  same  normal  form,  and  give  a 
procedure  for  finding  normal  forms,  then  we  can  use  this  procedure  to  formulate  an 
entirely  syntax-directed  type  checking  system. 

4.1.1  Reduction  of  Constructors 

I  begin  by  defining  a  reduction  relation,  /i  — s>  ///,  on  “raw”  constructors  that  (potentially) 
contain  free  variables.  This  primitive  notion  of  reduction  is  generated  from  four  rules: 
one  rule  each  for  3-  and  ^-reduction  of  functionals,  and  two  rules  for  reducing  Typerecs: 

W)  (Ai::/c.//)  //'  — >  {li'/t}fi 
( V )  (A t::/c.  (//£)) — >  //  (t  FV{n)) 

(tl)  Typerec  Int  of  (//int;  //arrow)  — >  A„t 
(t2)  Typerec  Arrow(/ii,  /t2)  of  (//in t; //arrow)  — > 

//arrow  Al  A  2  (Typerec  jJ  |  of  (//;nt)  //arrow))  (Typerec  A2  of  ( /  /  i  it  i  ■  //arrow ) ) 

I  use  T  to  abbreviate  the  union  of  these  four  relations: 

T  =  jflUTjUtl  Ut2 

I  extend  the  relation  to  form  a  congruence  by  defining  term  contexts  which  are  arbitrary 
raw  constructors  with  a  single  hole  in  them,  much  like  constructor  evaluation  contexts, 
but  with  no  restrictions  governing  where  the  hole  can  occur: 

C  ::=  []  |  Arrow(C,  //)  |  Arrow  (//,  C)  |  A  t::n.C  \  C  fi  \  fiC  \ 

Typerec  C  of  (//int ;  //arrow)  |  Typerec  //  of  (C;  //arrow)  |  Typerec  //  of 

The  constructor  C[ji]  represents  the  result  of  replacing  the  hole  in  C  with  the  constructor 
//,  possibly  binding  free  variables  of  //. 

Definition  4.1.1  // 1  — >  /v,2  iff  there  exists  if ,  [if  and  C  such  that  Hi  =  Chf],  /i2  = 
C[//2],  and  {if,  if)  e  T. 
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I  write  fi]  — y*  //2  for  the  reflexive,  transitive  closure  of  H\  — y  //2;,  // 1  < — >  H2  for 
the  symmetric  closure  of  H\  — y  /t2,  and  H\  < — y*  fi2  for  the  least  equivalence  relation 
generated  by  H\  — »*  /j 2.  The  constructor  //  is  in  normal  form  if  there  is  no  ///  such  that 
H  — y  /fi.  I  say  that  a  constructor  //  is  strongly  normalizing  ( SN )  if  there  is  no  infinite 
reduction  sequence  n  — y  Hi  — y  /i 2  — y 

Lemma  4.1.2 

1.  If  Hi  — *  H2,  then  {/i3/t}/i i  — >  {/x3/t}// 2- 

2.  If  Hi  — >  IE,  then  {Hi/tfHz  — >  {lE/tflE- 

Proof  (sketch):  Straightforward  induction  on  terms  and  case  analysis  of  the  reduction 
relation.  □ 

The  following  lemmas  relate  the  reduction  relation  to  definitional  equivalence.  In 
particular,  two  well-formed  constructors  Hi  and  H-2  are  definitionally  equivalent  iff  we  can 
convert  from  Hi  1°  l-E  via  the  < — equivalence  relation. 

Lemma  4.1.3  (Substitution)  If  A  hi  b  h  K  and  A  b  //  ::  k' ,  then  A  b 

{h' /t}n  ■■  k. 

Lemma  4.1.4  If  A  b  C\h\  then  there  exists  some  A'  and  n!  such  that  A'  b  //  ::  k' 
and  if  A'  b  /fi  ::  /fi,  then  A  b  C[/fi]  ::  k. 

Proof  (sketch):  By  induction  on  C.  □ 

Lemma  4.1.5  (Kind  Preservation)  J/Ab/c:K  and  h  — >  I1' ■  then  A  b  ///  ::  ac. 

Proof:  Suppose  A  b  C[h\  ■■  n.  By  the  previous  lemma,  there  exists  some  A'  and  /fi 

such  that  A'  \~  h n' .  Hence,  it  suffices  to  show  that  (/i,  /fi)  G  T  implies  A'  b  /fi  ::  /fi. 

fi  :  h  =  (At::/%  /q )  //2  and  ///  =  {/z2/fi}/^i-  Follows  from  the  typing  rule  for  functions 
and  the  Substitution  Lemma. 

?/  :  h  =  At:: /c i.  (//-it)  (t  fL  FV(hi))  and  ///  =  Hi-  By  the  typing  rule  for  functions, 
A'  l+l  {fi: : acx }  b  Hit  ::  /i2  for  some  k2  and  ac7  =  ac3  -E  k2.  By  the  application  rule, 

A'  l±J  jfi::ACi)  b  // 1  ::  /«'.  Since  t  does  not  occur  free  in  /ii,  A'  b  /ii  ::  a fi. 

tl  :  //  =  Typerec  Int  of  (//*;  Ha)  and  //fi  =  /q.  By  the  typing  rule  for  Typerec,  A'  b  //  ::  /fi 
and  A  b  Hi  n' . 
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t2  :  n  =  Typerec  Arrow(/ii,  A2)  of  (//*;  /j,a)  and 

//  =  a«  Ai  f-i 2  Typerec  Ai  of  (//*;  /iQ)  Typerec  /a2  of  (at  a«)- 

By  the  typing  rule  for  Typerec,  A'  b  a  ::  C  and  A  b  a*  ::  ac7,  A'  b  /ia  ::  fi  — A 
Q  — >  ac7  —A  k1  — »  ac7.  By  the  rule  for  Arrow,  A7  b  /ii  ::  Q  and  A7  h  /a2  ::  By  the 

application  rule,  //„  // 1  /i2  has  type  ac7  —a  ac7  —a  ac7.  By  the  typing  rule  for  Typerec, 
A7  b  Typerec  /i]  of  ( //, :  //„)  ::  ac7  and  likewise  for  //2.  Thus,  by  the  application  rule, 
A7  b  //  ::  //. 


□ 


Lemma  4.1.6  J/Ab/j::  /c  and  /a  — A  fi' ,  then  A  b  /a  =  /i7  ::  ac. 

Proof  (sketch):  Suppose  A  b  C[/i]  ::  ac  and  (/a,  /a7)  G  T.  Then  there  exists  some 

A7,  rd  such  that  A7  b  /a  ::  at7  and  by  preservation,  A7  b  ffl  ::  At7.  Argue  by  cases  that 
A7  b  /I.  =  //  ::  At7.  Then  show  by  induction  on  C  that  A7  b  /j  =  /a7  ::  At7  implies 
A  b  <7  [a]  =  C[/a7]  ::  ac.  □ 

Lemma  4.1.7  If  A  b  // 1  ::  ac,  A  b  yu2  ::  ac,  and  there  exists  a  /<  swc/a  that  fi  \  — A*  //  and 
H2  — A*  A,  f/*en  A  b  /ii  =  H2  :  ac. 

Proof:  By  the  previous  lemma,  A  b  // T  =  /a  ::  ac  and  A  b  /j2  =  /a  ::  ac  and  by  symmetry 
and  transitivity,  A  b  // 1  =  //2  ::  ac.  □ 

Lemma  4.1.8  A  b  // 1  =  //2  ::  ac  iff  // 1  « — A*  /a2. 

Proof:  The  “only-if”  is  apparent  from  the  previous  lemma.  Hence,  I  must  show 

that  A  b  // 1  =  //2  ::  ac  implies  /Ai  ■< — A*  //2.  I  argue  by  induction  on  the  derivation  of 

A  b  Hi  =  IJ.2  ::  ac.  Reflexivity,  symmetry,  and  transitivity  follow  from  the  definition  of 

< — A*.  The  arrow,  fn,  app,  and  tree  rules  follow  by  building  an  appropriate  context  C 
for  each  of  the  component  constructors  /a,  using  the  induction  hypothesis  to  argue  that 
Ii  =  /a7  implies  //  < — A*  ///  and  thus  C'[/a]  =  (7 [/a7].  The  subcases  are  glued  together  via 
transitivity.  The  / 3 ,  ry,  trec-int,  and  tree-arrow  rules  all  follow  from  their  counterparts  in 
T.  □ 

The  main  results  of  this  chapter  are  that  the  reduction  relation  “ — A*”  is  confluent 
and  strongly  normalizing  for  well-formed  constructors.  That  is,  every  reduction  sequence 
terminates  and  the  reduction  sequences  have  the  diamond  property:  If  a  constructor 
reduces  to  two  different  constructors,  then  those  two  constructors  reduce  to  a  common 
constructor. 
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Proposition  4.1.9  (Strong  Normalization)  If  Ah  /i  ::  n,  then  h  is  strongly  normal¬ 
izing. 

Proposition  4.1.10  (Confluence)  If  /j  — »*  //,  and  //,  — >*  /i2,  then  there  exists  a 
constructor  ff  such  that  // 1  — >*  ff  and  /i2  — »*  ff  ■ 

Confluence  for  “ — >*”  is  important  because  of  the  following  immediate  corollaries: 
First,  normal  forms  are  unique  up  to  o-conversion: 

Corollary  4.1.11  If  H  — >*  Hi  and  fi  — >*  /U2  and  Hi  and  /i2  are  normal  forms,  then 
Hi  =  H2- 

Second,  the  reduction  system  has  the  “Church-Rosser”  property,  which  tells  us  that 
finding  and  comparing  normal  forms  is  a  complete  procedure  for  determining  equivalence 
of  constructors. 

Corollary  4.1.12  (Church-Rosser)  /i  <; — >*  ff  iff  there  exists  a  ff'  such  that  //  — »*  h" 
and  ff  — >*  h"  ■ 

Proof:  The  “if”  part  is  obvious.  For  the  “only  if” ,  I  argue  by  induction  on  the  length 

of  the  sequence  g  < — >  hi  < — >  '  • -  < — >  hn  < — >  h' ■  F°r  n  =  0,  /<  < — >  //  and  thus  either 
H  — >  h'  or  else  ff  — t  h-  Without  loss  of  generality,  assume  h  — t  //.  Then  choose 
h"  =  h'  and  we  are  done.  Assume  the  theorem  holds  for  all  values  up  through  n.  We 
have  h  t — >  Hi  t — >  •  •  •  < — >  Hn  and  //.„  < — y  ff  ■  I  must  show  that  there  is  a  ff'  such  that 
Hn  — A  ff'  and  ff  — A  ff' .  By  the  induction  hypothesis,  there  exists  //„  and  Hh  such 
that  h  — ha,  hn  — >*  ha,  hn  — hb,  and  ff  — Hb ■  Since  Hn  reduces  to  both  Ha  and 
Hb,  we  have  via  confluence  that  there  exists  a  ff'  such  that  //„  — ff'  and  Hb  — C  h" 
and  hence  h  — h "  and  ff  — >*  ff' ■  n 

Since  for  well-formed  constructors,  convertibility  is  the  same  as  definitional  equiva¬ 
lence,  the  Church-Rosser  property  implies  that  two  well-formed  constructors  are  defini- 
tionallv  equivalent  iff  there  is  some  common  reduct.  If  I  can  show  that  all  well-formed 
constructors  are  strongly  normalizing,  then  we  have  a  decision  procedure  for  determining 
constructor  equivalence:  Choose  any  reduction  sequence  for  the  two  constructors  and 
eventually,  we  will  reach  normal  forms,  since  the  two  constructors  are  strongly  normal¬ 
izing.  Then,  the  Church-Rosser  theorem  together  with  unique  normal  forms,  tells  us 
that  the  two  constructors  are  equivalent  iff  the  two  normal  forms  are  equivalent  modulo 
a-conversion. 

In  the  presence  of  strong  normalization,  confluence  is  equivalent  to  local  confluence: 
If  h  — >  hi  an(l  h  — t  h2,  then  there  exists  a  ff  such  that  Hi  — h'  and  h2  — C  ff- 

Lemma  4.1.13  If  H  is  strongly  normalizing  and  locally  confluent,  then  H  is  confluent. 
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Proof:  Suppose  //  — )•*  /_/ 1  in  rn  steps  and  h  — >*  /q  in  n  steps.  Since  //  is  strongly 
normalizing,  there  exists  some  bound  6  on  the  number  of  steps  any  reduction  sequence 
can  take.  I  argue  by  induction  on  b  and  reduce  the  problem  to  m  <  1,  n  <  1.  If  m  =  0 
or  n  =  0  then  there  is  nothing  to  prove.  The  case  m  =  1,  n  =  1  is  handled  by  local 
confluence. 

Suppose  m  >  1  or  else  n  >  1.  Then,  we  have  //,  — >  q'  — 1  // 1  and  //  — »■ 
//2  — >”  1  /q  where  rn  —  1  >  0  or  n  —  1  >  0.  By  local  confluence,  we  know  there  is  a  fj" 
such  that  ji\  — >*  fj"  and  //2  — »*  //'. 

Now  fi\  and  p/2  have  smaller  bounds  than  //,  so  by  the  induction  hypothesis,  there 
exists  ji”  and  /i"  such  that  /q  — >*  //",  //"  — >*  /i",  yu2  — >*  q",  and  qw  — »*  I1”- 

Again,  q"  has  a  bound  less  than  6,  so  applying  the  induction  hypothesis,  we  know  that 
there  exists  a  q'  such  that  q"  — >*  q'  and  q"  — A  ///.  Thus,  qi  — A  q'  and  q2  — A  q'. 

The  result  is  summarized  by  the  following  diagram,  where  assuming  the  solid  arrows, 
we  have  shown  that  the  dotted  arrows  exist: 


□ 

All  that  remains  is  to  establish  local  confluence  and  strong  normalization,  which  I 
address  in  Sections  4.1.2  and  4.1.3,  respectively. 


4.1.2  Local  Confluence  for  Constructor  Reduction 

To  show  that  the  reduction  system  for  constructors  is  locally  confluent,  I  must  show 
that  whenever  q  — >•  Hi  and  q  — ^  /i2,  then  there  exists  a  p/  such  that  /q  — A  /j'  and 
Ha  — A  h'  ■ 
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Let  D  range  over  the  set  of  expressions  with  exactly  two  holes  in  them.  This  set  can 
be  described  by  the  following  grammar: 

D  ::=  Arrow(C,i,C2)  |  CXC2  |  Typerec  Cx  of  (C2,  pa m>w)  I 

Typerec  Cx  of  (p-mt;  C2)  |  Typerec  p  of  (Ci,  C2)  |  C[D\ 

where  I  use  C[D\  to  denote  the  two-holed  constructor  formed  by  replacing  the  hole  in  C 
with  D  and  I  use  D[p  1,  p2]  to  denote  the  constructor  obtained  by  replacing  the  left-most 
hole  in  D  with  p 1  and  the  right-most  hole  in  D  with  p2. 

Suppose  p  =  Ca[pa ],  {HaOJa)  e  T,  and  p  =  Cb[pb ]  and  (pb,  p'b)  G  T.  The  two 
constructors,  pa  and  pb  are  said  to  overlap  in  p  if  one  constructor  is  only  found  as  a 
subterm  of  the  other  (i.e.,  pa  =  C[pb\  or  pb  =  C[pa]  for  some  C).  If  //„  and  pb  do  not 
overlap,  then  it  is  clear  that  there  exists  some  D  such  that  //  =  D[pa,  pb]  and  thus: 

/i  =  D[pa,  pb]  — >  D\n'a,  iib]  — >  D\ji'a,  n'b ] 

and 

/i  =  D[fia,  m]  — >  D\[ia,  p'h]  — >  D[p'a,  ii'b]. 

Therefore,  we  need  only  consider  overlapping  constructors  to  show  local  confluence. 

Let  n„  and  p ib  be  overlapping  constructors  in  p,  such  that  p  =  Ca[pa],  pa  =  Cb[pb\, 
and  suppose  (pa,  p'a)  G  T  and  (pb,  p'b)  G  T.  Without  loss  of  generality,  we  may  ignore  the 
outer  context,  C„ .  There  are  four  cases  for  the  reduction  from  pa  to  p'a.  I  consider  each 
case  below  and  show  that,  for  each  rule  taking  pb  to  p'h,  there  exists  a  p'  and  sequence  of 
reductions  which  takes  p'a  to  p!  and  Cb[p'b\  to  p! .  Each  argument  is  made  by  presenting 
a  diagram  where  the  left-arrow  represents  the  reduction  from  pa  to  p'a .  and  the  right 
arrow  represents  the  reduction  from  pa  =  Cb[pb\  to  Cb[p'b\.  The  solid  arrows  represent 
assumptions,  whereas  the  dotted  arrows  represent  the  relations  I  claim  exist.  I  use  t  to 
represent  some  arbitrary  reduction  from  T,  a  to  represent  a  conversion,  =  to  represent 
zero  reductions,  and  t*  to  represent  zero  or  more  applications  of  a  t  reduction. 

case  [3:  Both  sub-cases  follow  from  lemma  4.1.2,  part  1  and  2  respectively. 


(Xtr.K.  pi)  p2 


{p2/t}p[ 


(Xtr.K.  pi)  p2 


{p2/t}pi  (Xtr.K.  px)  p2 

t*'  'k 

{p'2/t}p1 
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case  rj:  The  constructor  is  an  rj- redex  A t::n.  (hi  t).  If  the  inner  reduction  occurs  within 
Hi ,  then  the  result  is  obvious,  since  t  cannot  occur  freely  within  Hi-  If  the  inner  reduction 
is  an  application  of  j3  because  Hi  is  a  function,  then  the  results  of  the  outer  reduction 
and  inner  reduction  yield  terms  that  are  equal,  modulo  (^-conversion. 


A t\\K.  (hi  t )  At:: ac.  ((A Hi)  t) 


t  '  .  .  "n  =■  .  .  -  a 


f  \  //  / 

/ix  At  ::k  .  fjti 


case  tl:  The  constructor  is  a  Typerec  applied  to  Int.  The  inner  reduction  either  mod¬ 
ifies  Hi  or  Ha-  If  IT  is  reduced,  then  after  performing  the  Typerec  reduction,  the  same 
reduction  can  be  applied.  If  Ha  is  reduced,  then  after  performing  the  Typerec  reduction, 
Ha  disappears,  and  the  terms  are  equal. 

Typerec  Int  of  (/./*;  Ha) 

IT  .  Typerec  Int  of  {Hi,  Ha) 

t  '  •  tl 

IT 

Typerec  Int  of  ( Hi ;  IT, ) 

Hi.  Typerec  Int  of  (hu  Ha) 


'  <  >  ' 

Hi 


•  tl 


case  t2:  The  constructor  is  a  Typerec  applied  to  Arrow(//i,  //2).  There  are  four  sub¬ 
cases:  Hi ,  A25  Hi i  or  Ha  is  reduced.  In  any  of  these  cases,  we  can  apply  the  same  reduction, 
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possibly  multiple  times  to  yield  the  same  term. 


Typerec  Arrow(//i  ,/x2)  of  (aa  Aa) 


Aa  A 1  pi  (Typerec  //. ,  of  (aa  Aa)) 
(Typerec  /i2  of  (aa  Aa)) 


Typerec  Arrow(//1,  A2)  of  (aa  Aa) 


Aa  Ai  A2  (Typerec  ^  of  (aa  Aa))  (Typerec  /x2  of  (aa  Aa)) 


Typerec  Arrow(Ai,A2)  of  (aa  Aa) 


Aa  Ai  A 2  (Typerec  /p  of  (aa  Aa)) 
(Typerec  /i2  of  (aa  Aa)) 


Typerec  Arrow(/ii, //2)  of  (aa  Aa) 


A  a  Ai  A2  (Typerec  ^  of  (aa  Aa))  (Typerec  /i2  of  (aa  Aa)) 


Typerec  Arrow(//i,  A2)  of  (aa  Aa) 


Aa  Ai  A 2  (Typerec  //,  of  (aa  Aa)) 
(Typerec  /r2  of  (aa  Aa)) 


Typerec  Arrow^i,  As)  of  (a'a  Aa) 


Aa  Ai  A 2  (Typerec  ^  of  (a);  Aa))  (Typerec  p,2  of  (a);  Aa)) 
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Typerec  Arrow(//i,  H2)  of  (hh,  Ha) 


Ha  Hi  H‘2  (Typerec  Hi  of  {hh  Ha)) 
erec  h2  of  (Hu  Ha)) 

t*  '.  .  t2 

*  A.  >  ’ 

Ha  Hi  H‘2  (Typerec  //,  of  (hH  Ha))  (Typerec  //2  of  (^;  //J) 

Finally,  from  the  diagrams  above,  we  have  the  desired  property. 

Theorem  4.1.14  (Local  Confluence)  If  //  — >■  /Ui  and  h  — >•  H2,  then  there  exists  a 
H1  such  that  // 1  — >*  ff  a?rd  fj,2  — >*  H2- 

All  that  remains  is  to  show  that  well-formed  constructors  are  strongly  normalizing. 

4.1.3  Strong  Normalization  for  Constructor  Reduction 

Our  proof  of  strong  normalization  for  the  constructor  reduction  relation  uses  unary  logical 
relations  (predicates),  but  in  a  setting  where  we  can  have  open  terms.  The  ideas  follow 
closely  those  of  Harper  [55]  and  Lambek  and  Scott  [79]. 

The  predicates  are  indexed  by  both  a  kind  (k)  and  a  kind  assignment  (A)  and  are 
defined  as  follows: 

1 10| | A  =  {h  I  A  b  n  ::  k,  h  SN} 

||aci  — >•  /C2IU  =  {h  I  A  h  H  Ki  ^2,VA'  A  A.  V Hi  €  \\ki\\a'-HHi  £  ||ft2||A'} 

The  idea  is  to  include  only  those  constructors  that  are  strongly  normalizing  and  then 
show  that  every  well-formed  constructor  is  in  the  appropriate  set.  From  the  definitions, 
it  is  clear  that  if  /j  G  ||n||A  then  A  h  //  ::  k  and  for  all  A'  A  A,  |  f/c ] ] a' ■ 

Lemma  4.1.15  t  /i\  /i2  •  •  •  H,<  %s  SN  iff  ea°h  Hi  SN. 

The  following  lemma  shows  that  every  constructor  in  one  of  the  sets  is  strongly 
normalizing  and  that  a  variable  t  is  always  in  ||ac||a,  whenever  A (t)  =  n. 

Lemma  4.1.16  1.  If  H  £  ||«||aj  then  h  is  SN. 
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2.  If  A  b  t  ::  fa  fa  ■■  ■  fa  Kn  fa  k,  A'  h  fa  ::  fa  and  fa  is  SN  (1  <  i  <  n)  for  some 
A'  A  A,  then  t  fa  •  •  •  \jn  G  ||^||a'- 

Proof:  Simultaneously  by  induction  on  ft.  If  ft  =  fi,  then  part  1  is  built  in  to  the 

definition  and  part  2  follows  since  t  // 1  •  •  •  fa,  is  SN  whenever  each  of  the  fa,  are  SN. 

Suppose  k  =  k!  fa  k"  and  /j  G  ||^||a-  Take  A'  =  A  l±)  {t: : av'} .  By  induction  hypothesis 
2 ,  i  G  IIac'II a'  and  fit  G  1 1 ft'' 1 1 a'  from  which  it  follows  via  induction  hypothesis  1  that  fit 
is  SN.  This  in  turn  implies  that  /j  is  SN. 

Suppose  ft  =  ft'  — >  ft",  A  b  t  ::  — >■  •  •  •  fa  nn  — >  ft,  and  A'  fr  fa,  ::  fa,  is  S' IV  for  some 

A'  D  A.  Let  fi  G  1 1  ft'  1 1  a'  -  I  must  show  that  t  p  \  •••  fin  fi  G  1 1  ft"  1 1  A'  -  But  by  induction 
hypothesis  1,  fi  is  SN.  Hence,  by  induction  hypothesis  2,  the  result  holds.  □ 

Corollary  4.1.17  If  AS  t  w  k,  then  t  G  ||ft||A- 

The  following  lemma  shows  that  the  predicates  are  closed  under  a  suitable  notion  of 
/3-expansion.  This  lemma  is  crucial  for  showing  that  well-formed  A-terms  are  in  the  sets 
and  therefore  are  strongly  normalizing. 

Lemma  4.1.18  Suppose  A  l±J  {7: : ft0 }  S  fi  ::  fti  — >  ■  ■  ■  -fa  ftn  fa  ft  and  fa,  G  1 1 ft* 1 1 a 
(0  <  i  <  n).  If  {{fiQ/t}fi)  fa  ■■■  fin  G  1 1  ft  1 1  a?  then  (A  t::ft0.  fi)  fio  hi  •  •  •  hn  G  ||«||a- 

Proof: 

By  induction  on  ft.  Assume  ft  =  fi.  I  must  show  fia  =  (A t::ft0.  //)  fi0  p\  •  •  •  fj,n  is  in 
1 1  fi  1 1  a  -  It  suffices  to  show  that  this  term  is  SN.  Suppose  not.  Then  there  exists  some 
infinite  reduction  sequence  starting  with  fia.  Since  ({ho/t}h)  hi  ' ' '  hn  G  |  |fi 1 1 a?  this 
term  is  SN  and  hence  all  of  its  components  are.  Moreover,  since  fio  G  1 1 fto  1 1  a 5  ho  is  SN. 
Hence,  no  infinite  reduction  sequence  can  take  place  only  within  //,  fi0,  ft  i ,  •  •  • ,  fin.  Thus, 
any  infinite  reduction  sequence  has  the  form: 


(At:: fto.  h)  ho  hi  •  •  •  hn  — >* 

(At::  fto  ■  h')  ho  h'l  •  •  •  h'n  — > 
({/i{,/t}/i,),/i'i,...X  — 


Consequently,  fi  — »*  //',  /i0  — >*  //0,  and  {/Wt}/7- 
an  infinite  reduction  sequence: 


{ /-4, /t } h' •  Thus,  we  can  construct 


{{ho/t}hhl  ■  ■  ■  hn  - >* 

{{ho/t}h'  h'l  •••  hn  - »  h" 


contradicting  the  assumption.  Therefore,  fi„  G 


A- 
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Assume  ac  =  k!  — >•  ac",  and  let  fjl  G  ||ac'||A/  for  A'  D  A.  I  must  show  that 

{Xty.KQ.  fi)  fJLo  fi!  •••  linli' 
is  in  ||k"||A/.  By  the  induction  hypothesis,  this  holds  if 

{{Ho/t}/i)  H\  \in\T 
is  in  ||«"||A/.  But  this  follows  from  the  assumption  that 

({/«o  A}/*)  •  •  •  Tn 


is  in  1 1 h!  — ^  k"\ |a.  □ 

Corollary  4.1.19  If  A  l±)  {A::ac0}  S  T  K>  To  G  ||ac0||a,  and  {ii0/t}/j  G  ||ac||a,  A/<en 
(AA::ac0.  n)  /i0  G  ||ac||a. 

The  following  lemma  shows  that  the  predicates  are  closed  under  a  suitable  notion 
of  Typerec-expansion.  This  lemma  is  crucial  for  showing  that  well-formed  Typerec- 
constructors  are  in  the  sets  and  hence  are  strongly  normalizing. 

Lemma  4.1.20  Let  ac'  =  acx  — >■  •  •  •  — >■  Kn  — >■  ac  and  suppose  A  h  Typerec  n  of  ( //, :  /ja)  :: 
ac',  ptj  G  II^jHa  (1  <  j  <  nj,  T  G  1 1 n 1 1 a i  IM  £  IIk'Ha,  and  pba  G  1 1 Q  — >  O  — ^  /s:7  — ^  /s:'  — > 
Ac7 1 1  a  -  Then  Typerec  /i  of  (/x*;  /ia)  G  ||ac'||a. 

Proof: 

By  induction  on  ac.  If  ac  =  Q,  then  it  suffices  to  show  that 

Typerec  //  of  (//*;  /ia)  //. ,  •••//„ 

is  S' A.  I  argue  by  induction  on  the  height  (/i)  of  the  normal  form  of  //.  Suppose  h  =  0. 
Then  the  normal  form  of  //  is  either  a  variable  or  Int.  Since  //  G  ||S1||a,  Ik  €  1 1 ac7 1 1  Ai  Ta  is 
in 

1 1 1  ^  ^  AC  — AC  — A'  AC  1 1  a, 

and  p,j  G  1 1 ac^ 1 1 a  (1  <  j  <  n),  yt,  //$,  yia,  and  are  all  SiV.  Hence,  any  infinite 

reduction  sequence  cannot  occur  only  within  these  terms.  Thus,  any  such  sequence 
must  perform  a  Typerec  reduction  and  the  normal  form  of  /r  must  be  Int,  so  the  infinite 
reduction  sequence  has  the  form: 

Typerec  ji  of  (/q;  //„)  // 1  •••//„  — >* 

Typerec  Int  of  (/i-;  //J  //,  •  •  •  /i'„  — »  //•  t'i  •  •  •  tL  — >  •  •  • 
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But  since  Hi  G  ||ac7||a,  and  Hj  G  ||aCj||a  (1  <  j  <  n),  ffi  n[  •••  ffn  is  SN.  Therefore, 
Typerec  //  of  (/q;  Ha)  G  ||ac7||a  when  h  =  0. 

Suppose  the  theorem  holds  for  all  terms  with  normal  forms  of  height  less  than  h  and 
suppose  the  normal  form  of  /i  has  height  h  +  1.  By  the  same  previous  argument,  any 
infinite  reduction  sequence  must  perform  a  Typerec  reduction.  Furthermore,  since  the 
normal  form  of  /i  has  height  h  +  1,  such  a  sequence  must  have  the  form: 

Typerec  /i  of  (/q:  Ha )  Hi  •  •  •  \in  — >* 

Typerec  Arrow(/ii,  //2)  of  (//■;  //J  //,  •  •  •  ffn  — > 

((h'a  hi  h2  (Typerec  //,  of  (/i';  /i'J)  (Typerec  /i2  of  (//■;  //,)))  h\  ■  ■  ■  ffn)  — >  •  •  • 

Since  /i  is  SN,  it  must  be  the  case  that  /i i  and  /c2  are  SN,  and  hence  /j i,//2  G  1 10| |a- 
Furthermore,  the  heights  of  the  normal  forms  of  Hi  and  /i2  must  be  less  than  or  equal 
to  h.  Hence,  via  the  inner  induction  hypothesis,  Typerec  H\  of  (ffn  fj/a)  G  1 10|  |  a  and 
Typerec  /i i  of  (//';  //a)  G  1 10| | a-  Since  by  assumption  Ha  G  ||fi  — >■  ac7||a, 

(hi  hi  h 2  (Typerec  //,  of  (/i7;  //J)  (Typerec  /./2  of  (//|;  /r'J))  //',  •  •  •  Hn 

is  SiV  and  in  ||Q||A. 

Now  suppose  ac  =  ac7  — >■  ac77  and  let  /i7  G  ||ac7||A/  for  some  A  D  A.  I  must  show  that 
(Typerec  h  of  (/q;  Ha ))  /ii  •  •  •  ff  G  ||ac77||A/.  But  this  follows  from  the  outer  induction 
hypothesis.  □ 

Corollary  4.1.21  //Ah  Typerec  /j  of  (/q;  /ia)  ::  k,  h  £  1 10 1 1  a7  h<  b  ||ac||a,  and  Ha  G 
1 |fi  — >•  Q  — >  ac  — >■  ac  — )•  ac||a,  then  Typerec  / i  of  (/q;  /iQ)  G  ||ac||a. 

Let  d  range  over  substitutions  of  constructors  for  constructor  variables. 

Definition  4.1.22 

1.  A7|b  jj,  ::  ac [5]  iff  G  ||k||a/. 

2.  A7 1  b  5  ::  A  iff  Dom(5)  =  Dom(  A)  and  for  each  t  in  Dom(5),  5(t)  G  1 1 A (/)  1 1  a7  - 

3.  |  h  A  h  fx  : :  ac  iff  for  every  A'  and  every  5  such  that  A7|  h  5  ::  A,  A7|  h  fj  ::  ac[A], 

Theorem  4.1.23  If  A  b  //  : :  ac,  then  |  b  A  b  /j  ::  ac. 

Proof:  By  induction  on  the  derivation  of  A  h  (i  ::  ac.  Suppose  A7|b  5  ::  A. 
var:  Holds  by  the  assumption  A7|b  <5. 

int:  Holds,  since  <5(lnt)  =  Int  is  trivially  SN  and  thus  Int  G  ||0||A'. 
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arrow:  Must  show  A'|  h  Arrow(p,i, /i2)  ::  0[h].  By  the  induction  hypothesis,  8(/x i)  G 
||fi||A'  and  8(1x2)  G  ||fi||A'-  Hence,  //,  and  /i2  are  S N  and  Arrow(/ii,  /i2)  is  SN.  Thus, 
Arrow(/ii,  /i2)  G  ||^||a'- 

fn:  Must  show  S(Xt::ni.  /x)  G  ||«i  — >■  «2||a'-  Let  /x 1  G  ||^i||a"  for  some  A"  =  A' l±l 

I  must  show  that  {/xi/t}8(fx)  G  1 1 ac2 1 1 a" -  By  the  induction  hypothesis,  8/x  G  ||ac2||a"  so 
the  result  follows  from  lemma  4.1.19 

app:  Must  show  8(1x11x2)  £  1 1 ac2 1 1 a7 -  By  the  induction  hypotheses,  S(ix1)  G  1 1 aci  — >  ac2||A/ 

and  8(1x2)  G  1 1 1 1 a' 5  so  the  result  follows  from  the  definition  of  1 1 — >  k2||a'- 

tree:  Must  show  5(Typerec  //  of  (//*;  fxa))  G  ||s||a'-  By  the  induction  hypotheses,  8(fx)  G 

I I  f2 1  ]  A' ,  8(fXi)  G  1 1  AC  1 1  A' ,  and  8(fxa)  G  ||fi  — ?•  Q  — : >  n  ^  n  — >  k||A',  so  the  result  follows  from 
lemma  4.1.21.  □ 

Corollary  4.1.24  (Strong  Normalization)  If  A  h  fx  ::  k,  then  /x  is  strongly  normal¬ 
izing. 

Proof:  Pick  8  to  be  the  identity  substitution  for  A.  That  is,  8  =  {t=t  \  t  G  Dom( A)}. 
Then  it  is  easy  to  see  that  A|  h  8  ::  A.  Moreover,  S(/x)  =  / x  G  |[k||a  and  thus  //  is  SN. 

□ 


Corollary  4.1.25 

1.  Every  constructor  /x  has  a  unique  normal  form,  NF(/x). 

2.  If  / x  is  well-formed,  there  is  an  algorithm  to  calculate  NF(/x). 

3.  Conversion  of  well-formed  constructors  is  decidable. 

4.1.4  Decidability  of  Type  Checking 

In  the  following  definitions,  I  establish  a  suitable  notion  of  a  normalized  derivation  for 
typing  judgments.  I  then  show  that  a  term  is  well  typed  iff  there  is  a  normal  derivation 
of  the  judgment.  The  proof  is  constructive,  and  thus  provides  an  algorithm  for  type 
checking  Xf41. 

Definition  4.1.26  (Normal  Types,  Judgments)  A  type  a  is  in  normal  form  iff  it  is 
int,  T(/x)  where  //  is  normal,  (Ji  — >  a2  where  ay  and  a2  are  normal,  or  \/t::n.a'  where  a' 
is  normal.  A  judgment  A:  T  h  e  :  a  is  normal  iff  a  is  normal. 

From  the  properties  of  constructors,  it  is  clear  that  every  well-formed  type  a,  has 
a  unique  normal  form  NF(a),  and  that  finding  this  form  is  decidable  —  we  simply 
normalize  all  of  the  constructor  components  and  replace  all  occurrences  of  T(lnt)  with 
int  and  all  occurrences  of  T(Arrow(p,i,  p,2))  with  T(/x  1)  — >  T(/x2).  Furthermore,  this 
normalization  process  preserves  type  equivalence. 
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Lemma  4.1.27  If  Ah  a,  then  Ah  a  =  NF(a). 

Proof  (sketch):  Follows  from  confluence  and  strong  normalization  of  constructors. 

□ 

In  the  absence  of  typerec,  a  normal  typing  derivation  is  simply  a  derivation  where 
we  interleave  uses  of  the  non-equiv  rules  and  a  single  use  of  the  equiv  rule  to  normalize 
the  resulting  type.  However,  for  uses  of  typerec,  we  need  additional  uses  of  equiv  to 
“undo”  the  normalization  for  the  inductive  cases. 


Definition  4.1.28  (Normal  Derivations)  A  typing  derivation  V  of  the  normal  judg¬ 
ment  A;  T  h  e  :  NF(a )  is  in  normal  form  iff  V  ends  in  an  application  of  the  equiv 
rule, 


(equiv) 


£>i  •  •  •  n 

A;  r  h  e  : 


Ahcr 


A:  T  h  e  :  NF(a) 


NF(a) 


and: 


1.  the  rule  R  is  an  axiom,  or 

2.  R  is  neither  equiv  nor  tree,  and  T>\,  •  •  • ,  Vn  are  normal,  or 

3.  R  is  a  use  of  tree  where  the  sub- derivations  V\  and  V 2  are  of  the  form: 

T)\  A  h  jVF({lnt/f}a)  =  (lnt/t}a 


(equiv) 


A  h  e,  :  {lnt/t}a 


and 

A  h  NF(\/ti,  f2::Q.{fi/f}a  — >■  {f2/f}a  — >■  {Arrow(fi,  t2)/t}a)  = 

V 2  Vti,t2  ::  0.{ti/t}a  — >■  {t2/t}cr  — >  {Arrow(ti, t2)/t}a 

A  h  ea  :  t1,t2.{ti/t}a  -)■  {t2/t}a  {Arrow(fi,  f2)/f}cr 

and  P)  and  P2  are  normal. 

Theorem  4.1.29  A;  T  h  e  :  a  iff  there  exists  a  normal  derivation  o/A;Fhe:  NF(a). 

Proof:  The  “if”  part  is  immediate:  The  normal  derivation  ends  in  A:  T  h  e  :  NF(a). 

Since  a  type  and  its  normal  form  are  equivalent,  a  single  additional  application  of  the 
equiv  rule  yields  a  derivation  of  A;  T  h  e  :  a. 

For  the  “only  if”  part,  we  use  the  following  algorithm  to  transform  the  derivation  of 
A;  T  h  e  :  a  into  a  normal  derivation  of  A;  F  h  e  :  NF(a).  The  algorithm  is  given  by 
induction  on  the  derivation  V  of  A;  T  h  e  :  a. 
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If  V  is  an  axiom,  then  we  add  an  additional  equiv  rule,  using  the  fact  that  A  b 
NF(a)  =  a.  The  resulting  derivation  is  in  normal  form. 

If  V  ends  with  an  application  of  equiv, 


(equiv) 


(R) 


'D  i  •  •  •  T>1 

A;  T  b  e  :  a 


A  b  o’  =  o 


A;  T  h e  :  a 


then  the  normal  form  of  a1  must  also  be  NF(a).  Hence  if  V\  and  Vn  are  the  normal 
derivations  of  V\  through  Vn  respectively,  the  derivation 


(equiv) 


(R) 


^  •••  K 

A;  T  b  e  :  a" 


Ah  a" 


A;  T  h  e  :  NF(a) 


NF(a) 


is  normal. 

If  V  ends  in 


(R) 


V,  •  •  •  V, 

A;  T  h  e  :  a 


where  R  is  not  equiv,  we  first  transform  V\  through  Vn  to  normal  derivations  T>[ 
through  Vn .  I  must  now  show  that  the  rule  R  applies,  followed  by  an  application  of 
equiv,  yielding  the  normal  judgment  A;  T  h  e  :  NF(a).  There  are  five  cases  to  consider: 
case  fn:  V\  ends  in  A;  T  l±l  {x\<7i}  h  e  :  a2.  The  normal  derivation  V[  allows  us  to 

conclude  that  A;  F  l±l  x:cr1  h  e  :  NF(a2)  Applying  the  fn  rule,  we  can  conclude  that 

A;  T  h  Xx:ai.e  :  o\  — >■  NF^a?)-  Since  A  h  02  =  NF(cr2),  A  h  o\  — )•  NF(a2)  =  a  = 
NF(a).  Hence,  following  fn  with  equiv,  we  can  conclude  that  A;  F  h  Xx:ai.e  :  NF(a). 

case  app:  V\  ends  in  A;  T  h  a2  and  V2  ends  in  A;  T  h  e2  ■  cq.  The  normal 

derivation  T>[  ends  in  A:  T  h  ex  :  NF{ov  — >  cr2).  Now,  A  b  NF(( 71  — )■  cr2)  =  NF(a±)  —)■ 
NF(a2).  Hence,  V2  ends  in  A;  T  h  e2  :  NF(cr±),  and  applying  the  app  rule,  we  can 
conclude  that  A;  T  b  e\  e2  :  NF(a2).  Since  A  b  NF(a2)  =  a2 ,  and  o2  =  a,  we  can  apply 
the  equiv  rule  yielding  A;  T  b  e1  e2  :  NF(a). 

case  tfn:  V\  ends  in  A  l±J  {b:«;}:r  b  e\  :  a\.  The  normal  derivation  T>[  ends  in 
A  l±)  F  b  ei  :  NF(ai)  Applying  the  tfn  rule,  we  can  conclude  that  A;  T  h  At::n.e±  : 

Vt::n.NF((Ji).  Since  A  l±J  {t: :/v}  b  <J\  =  NF(ai),  A  b  \/t::n.c 7X  =  Mtwn.N F{ai)  = 

NF(Vt::K.ai).  Thus,  applying  the  equiv  rule,  we  can  conclude  that  A;  T  b  Mtv.K.ei  : 

NF(cr). 

case  tapp:  V 1  ends  in  A;  T  b  e\  :  VbiK.ai,  whereas  the  normal  derivation  V[  ends 
in  A;  F  b  e1  :  NFlVtr.K.^).  Now,  A  b  AF(Vb:/c.(j1)  =  Vf::/c.AF((j1).  Applying  the 
tapp  rule,  we  can  conclude  that  A;  F  b  e1  [/j]  :  {/1 /t}NF(a1 ).  By  equivalence  of  types 
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under  substitution  of  a  well-formed  constructor,  we  can  conclude  that  A  b  {p/tja!  = 
{fj,/t}NF(ai).  Furthermore,  A  b  {n / 1} N F (cq)  =  NF({n/t}a1).  Thus,  A  b  a  =  NF(cr) 
and  by  an  application  of  the  equiv  rule,  we  can  conclude  that  A;  F  b  e\  [//]  :  NF(a). 
case  tree:  V\  ends  in  A;  T  b  e*  :  {lnt/t}(Ji  and  V2  ends  in  A;  F  b  ea  : 
Vti,  — >■  {t2/t}a\  — >■  {Arrow(fi,  f2)/f}<ri.  By  induction,  the  normal 

derivations  corresponding  to  these  two  derivations  are  V[  which  ends  in  A;  F  b 
e*  :  NF({\nt/t}(Ji)  and  V'2  which  ends  in  A;  F  b  ea  :  NF(Vti,  t2v.Q.{ti /t}cri  — >■ 
{t2/t}ai  — >■  {Arrow(ti,  t2)/f}(Ji.  Now  A  b  AF({lnt/t}(Ji)  =  {i/cq}  and  A  b 
NF(Vt1,t2::Q.{t1/t}cr1  ->■  {f2/f}cq  ->■  {Arrow(fi,  f2)/f}cq)  =  Vfi, f2::fi.{fi /£}cq  -> 
{t2/t}a i  — >■  {Arrow(fi,  f2)/f}cq.  So,  applying  equiv  to  and  V2.  followed  by  an  appli¬ 
cation  of  tree,  we  can  conclude  that 

A;  T  b  typerec  //  of  [f.eq](e*,  ea)  :  {n/t}a  1 

Since  A  b  i  =  NF({/j,/t}ai),  we  can  apply  equiv  to  this  derivation  yielding 

A;  F  b  typerec  //  of  / . rr ,  (f!|;  ea)  :  NF(a).  □ 

Theorem  4.1.30  Given  A,  F,  and  e,  where  F  well-formed  with  respect  to  A,  there  is 
an  algorithm  to  determine  whether  there  exists  a  a  such  that  A;  T  b  e  :  a. 

Proof:  By  the  previous  theorem,  the  judgment  A;  T  b  e  :  a  is  derivable  iff  there  is  a 

normal  derivation  V  of  A;  F  b  e  :  NF(a).  I  proceed  by  induction  on  e  to  calculate  such  a 
derivation  if  it  exists,  and  to  signal  an  error  otherwise.  By  an  examination  of  the  typing 
rules,  it  is  clear  that  for  each  case,  at  most  one  rule  other  than  equiv  applies, 
case  var:  If  e  is  a  variable  x,  then  the  normal  derivation  is  A;  F  b  :  T(x)  followed  by 
A;  T  b  x  :  NF(T(x)).  If  x  does  not  occur  in  F,  then  e  is  not  well-typed, 
case  int:  If  e  is  an  integer  i,  then  the  normal  derivation  is  A;  F  b  x  :  int,  followed  by  a 
reflexive  use  of  equivalence  (i.e. ,  int  =  int). 

case  fn:  If  e  is  a  function  Aaxoy.e',  then  the  bound  variable  can  always  be  chosen  via 
a-conversion  so  that  it  does  not  occur  in  the  domain  of  T.  If  the  free  type  variables  of 
o i  are  in  A,  then  Fttl  }  is  well-formed  with  respect  to  A,  else  there  is  no  derivation. 
By  induction,  there  is  an  algorithm  to  calculate  a  normal  derivation  of  A;  I  1+1  [x:a i}  b 
e'  :  NF(a2 ),  if  one  exists.  Given  this  derivation,  by  an  application  of  the  fn  rule  and  a 
reflexive  use  of  equiv,  we  can  construct  a  normal  derivation  A;  F  b  A x:a1.et  :  NF(a1  — >■ 
Gr¬ 
ease  tfn:  If  e  is  A tv.K.e',  then  the  bound  type  variable  can  always  be  chosen  via  re¬ 
conversion  so  that  it  does  not  occur  in  the  domain  of  A.  Hence,  the  context  A  l±J  {t::n}\  F 
is  well-formed.  By  induction,  there  is  an  algorithm  to  calculate  a  normal  derivation  of 
A  i+l  {£::«};  F  b  e'  :  NF(a),  if  it  exists.  If  so,  applying  tfn  followed  by  a  reflexive  use  of 
equiv,  we  can  construct  a  normal  derivation  of  A;  F  b  A t:\K.e'  :  N FlfJtv.n.a). 
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case  app:  If  e  is  an  application  e1  e2,  then  by  induction,  we  can  calculate  normal  deriva¬ 
tions  of  A;  Fhep  NF(aa)  and  A;  T  F  e2  :  NF(ab).  If  such  derivations  exist,  then  either 
NF(aa)  is  of  the  form  cq  — >•  a2  or  else  not.  If  not,  then  no  other  rule  applies,  so  the  term 
is  ill-formed.  If  so,  then  for  app  to  apply,  must  be  the  same  as  NF{ab).  If  this  holds, 
then  applying  the  app  rule,  yields  NF(a2).  Applying  the  equiv  rule  yields  a  normal 
derivation  of  A;  F  h  e1e2  :  NF(a2). 

case  tapp:  If  e  is  e'  [//],  then  by  induction,  we  can  calculate  a  normal  derivation  of 
A;  The':  NF(\/t::n.a)  if  it  exists  and  signal  an  error  otherwise.  If  it  exists,  applying 
tapp  yields  a  derivation  of  A;  T  F  e'  [/i]  :  {fj,/t}cr.  Following  this  derivation  with  a  use  of 
equiv  yields  a  normal  derivation  of  A;  F  F  e'  [ji]  :  NF({n/t}cr). 

case  tree:  If  e  is  typerec  /v,  of  [F.cr] (e* ;  ea )  then  by  induction,  we  can  calculate  normal 
derivations  of  A;  F  F  e*  :  NF(aj)  and  A;  F  F  ea  :  NF(aa)  if  such  derivations  exist.  Since 
type  equivalence  is  decidable,  we  can  also  determine  if  A  F  NF(aj)  =  {lnt/t}cr  and 
A  l±)  {ti:Al,t2::Q}  F  NF(aa)  =  {Arrow(fi, t2)/t}a.  If  so,  then  we  can  apply  the  equiv 
rule,  followed  by  the  tree  rule  to  yield  a  derivation  of  A;  T  F  typerec  /r  of  [t.a\(e. f,  ea )  : 

Then,  with  another  application  of  the  equiv  rule,  we  can  build  a  normal  deriva¬ 
tion  of  A;  T  F  typerec  //  of  [F.cr]  (e* ;  ea )  :  NF({^/t}a).  □ 


Corollary  4.1.31  (Type  Checking  Decidable)  There  is  an  algorithm  to  determine 
whether  there  exists  a  a  such  that  F  e  :  a. 

Proof:  The  type  assignment  0  is  trivially  well-formed  with  respect  to  the  kind  as¬ 

signment  0.  Hence,  by  the  previous  theorem,  we  can  calculate  whether  0;  0  F  e  :  o  is 
derivable  or  not.  □ 


4.2  XML  Type  Soundness 

In  this  section  I  prove  that  the  type  system  for  \fL  is  sound.  My  proof  is  a  syntactic 
one  in  the  style  of  Wright  and  Felleisen  [130].  The  basic  idea  is  to  show  that  every  well- 
formed  program  e  of  type  o  is  a  value  of  type  a,  or  else  there  exists  a  unique  e'  (modulo 
a-conversion),  such  that  e  steps  to  e'  and  e'  has  type  a.  Consequently,  no  well-formed 
XfL  program  ever  becomes  “stuck”.  The  notion  of  stuck  computations  is  captured  by 
the  following  definition. 

Definition  4.2.1  (Stuck  Constructors  and  Expressions)  A  constructor  is  stuck  if 
it  is  of  one  of  the  following  forms: 

1.  u  /j,  (u  is  not  a  X- constructor). 


CHAPTER  4.  TYPING  PROPERTIES  OF  \fL 


64 


2.  Typerec  u  of  (/Vt!  /Wow)  (u  ts  n°t  °f  the  form  Int  or  Arrow(ui,  u2) ). 

An  expression  is  stuck  if  it  is  of  one  of  the  following  forms: 

1.  v\v2  (v\  is  not  a  X- expression). 

2.  v  [p]  (v  is  not  a  A- expression). 

3.  typerec  U[n\  of  [t.a](up  ua)  (p  is  stuck). 

4-  typerec  u  of  [t.a](upua)  (u  is  not  of  the  form  Int  or  Arrow(?q,  u2)). 

Lemma  4.2.2  (Unique  Decomposition  of  Constructors)  A  closed  constructor  p  is 
either  a  constructor  value  u,  or  else  p  can  be  decomposed  into  a  unique  U  and  p'  such 
that  p  =  U[ix']  where  p'  is  either  an  instruction  or  is  stuck. 

Proof:  By  induction  on  the  structure  of  constructors.  If  p  is  Int,  Arrow(ul5  u2),  or 

A t::n.  pf  then  p  is  a  value.  Hence,  there  are  only  two  cases  to  consider, 
case:  Suppose  p  is  p\p2.  There  are  three  sub-cases  to  consider.  First,  if  p\  and  p2  are 
values  U]  and  u2,  then  the  only  decomposition  of  p  is  an  empty  context  U  =  []  filled  with 
u  |  u2.  If  u\  is  a  A-constructor,  then  p  is  an  instruction,  else  p  is  stuck. 

Second,  if  // 1  is  u  \ ,  but  /i2  is  not  a  value,  then  by  induction,  there  is  a  unique  U2  and 
//2  such  that  /i2  =  U2[fi2\  and  //'2  is  either  stuck  or  else  //2  is  an  instruction.  Hence,  the 
only  decomposition  of  //  is  Ui  U2[/a2\. 

Third,  if  fix  is  not  a  value,  then  by  induction,  there  exists  a  unique  U\  and  jj.\  such 
that  p  i  =  U\  [p\  ]  and  is  either  an  instruction  or  stuck.  Hence,  the  only  decomposition 
of  e  is  Ei[e[]  e2. 

case:  Suppose  p  is  Typerec  p\  of  (pp  p,a ).  If  p\  is  a  value  u,  then  the  only  decomposition 
of  p  is  an  empty  context  U  =  []  filled  with  p.  If  u  is  either  Int  or  Arrow(ui,  u2),  the  u  is 
an  instruction,  else  u  is  stuck. 

If  pi  is  not  a  value,  then  by  induction,  there  is  a  unique  L\  and  p[  such  that  p\  = 
U\  [p\  ]  and  p'x  is  either  an  instruction  or  stuck.  Hence,  the  only  decomposition  of  e  is 
Typerec  If  [p\ }  of  (pp  pa).  □ 

Lemma  4.2.3  (Unique  Decomposition  of  Expressions)  A  closed  expression  e  is 
either  a  value  v  ,  or  else  e  can  be  decomposed  into  a  unique  E  and  e'  such  that  e  =  E[e'] 
where  e'  is  either  an  instruction  or  is  stuck. 


Proof:  By  induction  on  the  structure  of  expressions.  The  argument  is  similar  to  the 

one  for  constructors.  □ 
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Lemma  4.2.4  (Determinacy)  For  any  closed  expression  e,  there  is  at  most  one  value 
v  such  that  e  v. 

Proof:  By  unique  decomposition,  if  e  is  not  a  value,  there  is  at  most  one  E  and  / 

such  that  e  =  E[ei]  where  e\  is  an  instruction.  Since  each  instruction  has  at  most  one 
rewriting  rule,  there  is  at  most  one  e'  such  that  e  i — >  e' .  □ 

The  following  lemma  allows  us  to  take  advantage  of  some  of  the  properties  I  proved 
about  constructor  rewriting  in  the  context  of  constructor  evaluation. 

Lemma  4.2.5  If  p  \ — >  p',  then  p  — >  p' . 

Proof:  By  the  fact  that  U  contexts  are  a  subset  of  C  contexts  (see  Section  4.1.1), 

and  the  three  evaluation  rules  of  constructors  correspond  precisely  to  /?,  tl,  and  t2, 
respectively.  □ 

Corollary  4.2.6 

1.  If  h  p  ::  k  and  /i  i — >  p! ,  then  h  p'  ::  k. 

2.  If  h  /i  ::  k  and  p  i — >■  //,  then  h  p  =  p'  ::  k. 

3.  If  A  hi  {t::K}  h  a  and  Ah  p  ::  k,  then  A  h  {p/t}a. 

4 .  If  A  hi  {t::K}  h  a  and  Ah  p  =  p'  ::  k,  then  A  h  {p/t}a  =  {p'/t}c t. 

Lemma  4.2.7  (Constructor  Substitution)  If  Ah  {t::n}-,  F  he:  a,  and  A  h  p  ::  k, 

then  A;  {p/t}T  h  {p/t}e  :  {p/t}a. 

Proof:  By  induction  on  the  normal  derivation  of  A  l±l  { t::n };  The:  NF(a).  In  each 

case,  we  back  up  the  derivation  one  step  to  the  application  of  the  non-equiv  rule, 
case  var:  We  have  A  l±l  {7: :/v} ;  T  h  x  :  r(.r).  Thus,  A;  {p/t}F  h  x  :  {p,/t}(r(;r)). 
case  int:  We  have  A  i+J  {t::n}]  T  h  i  :  int.  Thus,  A;  {p/t}T  h  i  :  {p/t} int. 
case  fn:  We  have  Ah{t::n}]  Y  h  Xx:ai.e  :  <Ji  -h  cr2.  By  induction,  A;  {/i/f}(ri±J{:r:(7i})  h 
{p/t}e  :  {p/t}a2.  Thus,  A;  {p/t}Y  l±l  {x:{p/t}ai}  h  {p/t}e  :  { p/t}a2 .  By  the  fn  rule, 
A;  {p/t}Y  h  {p/t}(Xx\al.e)  :  {p/t}(a^  -h  a2). 

case  tfn:  We  have  Al+i  {£::«;};  T  h  A t'r.n'.e  :  Mt'v.n' .a.  By  induction,  Al±l{t'::/c'};  {p/t}Y  h 
{p/t)e  :  { p/t}cr ,  since  t'  can  always  be  chosen  distinct  from  t.  By  the  tfn  rule, 
A;  {p/t}T  h  {p/t}At'::K,'.e  :  {p/t}\/t'::n.a. 

case  app:  We  have  A  l±J  {t: : ac};  T  h  e\  e2  :  a.  By  induction,  A;  {p/t}Y  h  {p/t}e i  : 
{p/t}(ai  -h  a)  and  A;  {p/t}Y  h  {p/t}e2  :  {p/t}a1.  By  the  app  rule,  A;  {p/t}Y  h 
{p/t){e ie2)  :  {p/t}a. 
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case  tapp:  We  have  A  l±)  {t: :/s;};  F  b  ex  [//]  :  {n'/t'}a.  By  induction,  A;  {fj/t}T  b 
{/j,/t}e1  :  .cr),  where  t'  is  chosen  distinct  from  t  via  n-coiiversion.  By  the 

tapp  rule,  A;  {n/t}T  b  {n/t}(ex  [//])  :  {/// t'}{{n/t}a). 
case  tree:  We  have 

A  l±l  T  b  typerec  / 1 '  of  [t1  .a\(ef,  ea)  :  {/j,1  /t'}a. 


By  induction, 


A;  {n/t}T  b  er  :  {Int /t'}{{n/t}a) 


and 

A;  {n/t} r  b  ea  :  Vti,  t2::fl.{ti /t'}({n/t}a)  ->■ 

02A,}({y«A}(J)  ->■  {Arrow(ti, t2)/f/}({^/f}(7)- 


By  the  tree  rule,  since  t!  and  t  can  always  be  chosen  to  be  distinct  via  a-conversion, 


A;  {n/t}T  h  {/i/t}(typerec  //  of  [t'.a^en  ea))  :  {{{n/t}n')/t'}({n/t}cr). 


Thus, 


A;  {n/t}T  b  e  :  {n/t}{{n' 


□ 


Lemma  4.2.8  (Expression  Substitution)  If  A;  Ftt)  {iricrx}  b  e  :  a,  and  A;  T  b  ex  :  a1, 

then  AT  b  {ei/x}e  :  a. 

Proof:  By  induction  on  the  normal  derivation  of  A;  Tl±l  b  e  :  NF(cr).  We  simply 

replace  all  derivations  of  A;  T  b  x  :  ox  with  a  copy  of  the  derivation  of  A;  F  b  ex  :  at .  The 
proof  relies  upon  weakening  the  context  A;  T  at  these  points  to  include  the  free  variables 
that  are  in  scope.  □ 

Lemma  4.2.9  7/b  E[e\  :  a,  then  there  exists  a  o'  such  that  b  e  :  a' ,  and  for  all  e'  such 
that  b  e'  :  a',  b  E[e']  :  a. 

Proof:  By  induction  on  the  normal  derivation  of  b  E[e\  :  a,  In  each  case,  we  back  up 

the  derivation  one  step  to  the  application  of  the  non-equiv  rule. 

If  E  is  empty,  the  result  holds  trivially  with  a  =  o’ .  Otherwise,  there  are  three  cases 
to  consider: 

case  appl:  E[e\  is  of  the  form  Ex[e]  e2.  By  the  typing  rules,  the  normal  derivation  ends 
in  a  use  of  app.  Hence,  b  Ex[e]  :  ox  — >  a  and  b  c2  :  at .  By  induction,  there  exists  a  a' 
such  that  b  e  :  a'  and  for  all  b  e'  :  a',  b  Ex[e']  :  ax  — >■  a.  Hence,  by  the  app  rule,  there 
are  derivations  of  b  Ex[e]  e2  :  a  and  b  Ei[e']  e2  :  a. 
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case  app2:  E[e]  is  of  the  form  tq  E2[e\.  By  the  typing  rules,  the  normal  derivation  ends 
in  a  use  of  app.  Hence,  b  rq  :  oq  — >  o  and  b  E2[e]  :  ot .  By  induction,  there  exists  a  o' 
such  that  b  e  :  o'  and  for  all  b  e'  :  o',  b  E2[e']  :  oq.  Hence,  by  the  app  rule,  there  are 
derivations  of  b  v  E2[e]  :  o  and  b  v  E2[e']  :  o. 

case  tapp:  E[e\  is  of  the  form  E\[e\  [//].  By  the  typing  rules,  the  normal  derivation  ends 
in  a  use  of  tapp.  Hence,  b  E\[e]  :  VfuAc.cq,  where  o  is  {yu/t}oq.  By  induction,  there 
exists  a  o'  such  that  b  e  \  o'  and  for  all  b  e'  :  o',  b  Ei[e']  :  Mtv.n.o i.  Hence,  by  the  tapp 
rule,  there  are  derivations  of  b  E \  [e]  [//]  and  b  E\  [E]  [/<].  □ 

Lemma  4.2.10  (Type  Preservation)  7/b  e%  :  o  mid  e\  \ — >  e2,  then  b  e2  :  o. 

Proof:  By  unique  decomposition,  there  is  a  unique  E  and  /  such  that  c-|  =  E[I], 

Thus,  E[I]  i — >■  E[e']  where  e2  =  E[e'].  By  the  previous  lemma,  there  exists  a  o'  such 
that  b  /  :  cb,  and  it  suffices  to  show  that  b  e'  :  a'.  There  are  six  basic  cases  to  consider, 
depending  upon  I.  In  each  case,  we  use  the  normal  derivation  of  b  /  :  o' ,  backing  up  to 
the  last  use  of  a  non-equiv  rule. 

case:  I  is  (A x:o1.e")  v  and  thus  e'  is  {v/x}e".  The  only  way  that  b  I  :  o'  can  be 
derived  is  by  the  app  rule.  Thus,  b  Xx:oi.e"  :  o\  — >■  o'  and  b  v  :  o1.  Any  normal 
derivation  of  b  Ax:ot  x"  :  o\  — >■  o'  must  end  with  a  use  of  abs  followed  by  an  equiv. 
Hence,  0;  {x:oi}  b  e"  :  NF(o').  Therefore,  by  the  Expression  Substitution  Lemma, 
b  {v/x}e"  :  NF(o'),  and  since  each  type  is  equivalent  to  its  normal  form,  b  e'  :  o'. 
case:  I  is  (A t::K.e")[U[J]\  and  e'  is  (A t::K.e")[U[fj]\,  where  U[J]  i — >  U[/j].  Any  derivation 
of  b  /  :  o'  must  end  with  the  tapp  rule.  Hence,  b  A t:\n.e"  :  VI::k.o\ ,  b  U[J]  ::  k, 
and  o'  =  {U[J]/t)oi.  By  Kind  Preservation  (lemma  4.1.5),  b  U[/i]  ::  n.  Thus,  by 
the  tapp  rule,  b  (A t::n.e")[U[fj]]  :  {U[/j]/t}oi.  By  equivalence  of  constructors  under 
reduction,  b  U[J]  =  U[/j].  Therefore,  by  equivalence  of  types  under  substitution  of 
equivalent  constructors,  b  {U[J]/t}oi  =  {U[fj]/t}oi.  Thus,  by  the  equiv  typing  rule, 
b  (A t::n.e")[U[fj]\  :  {U[J]/t}oi. 

case:  /  is  (A t::n.e")[u]  and  e'  is  {u/t}e".  The  last  step  in  the  normal  derivation  of  b  I  :  o' 
must  be  a  use  of  tapp.  Thus,  b  A tr.K.e"  :  'it\\K.o\  and  o'  =  {u/t}o\.  Therefore,  by  the 
Constructor  Substitution  Lemma,  b  {u/t}e"  :  {u/t}NF(oi).  By  equivalence  of  types 
under  substitution  of  equivalent  constructors,  b  {u/tt]N F{oi)  =  {u/t}o1  =  o.  Thus,  by 
the  equiv  typing  rule,  b  {u/t}e”  :  o. 

case:  I  is  typerec  U[J]  of  [t.Oi](ei]  ea)  and  e'  is  typerec  U[/j]  of  / .rr,  (c,:;  c„).  The  last 
step  in  the  normal  derivation  of  b  /  :  o'  must  be  a  use  of  tree.  Thus,  b  U[J]  ::  Q  and  by 
kind  preservation,  b  U[n\  ::  ff.  Therefore,  by  tree,  b  typerec  U[/j]  of  [f.a1](ei;  ea)  :  o 
case:  I  is  typerec  Int  of  [f.cri](e||  ea )  and  e'  is  c*.  The  last  step  in  the  normal  derivation 
of  b  I  :  o'  must  be  a  use  of  tree.  Thus,  o'  is  equivalent  to  {lnt/f}<Ti.  By  the  e*  typing 
hypothesis,  b  e,  :  { Int/t  jo"! .  Thus,  b  e'  :  o. 
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case:  /  is  typerec  Arrow(ui,  u2)  of  [f.<j1](^ea)  and  e'  is 

//2  (typerec  //,  of  [la1](e^efl))(typerec  u2  of  cfl)). 

The  last  step  in  the  normal  derivation  of  h  /  :  a'  must  be  a  use  of  tree.  Thus,  a'  is 
equivalent  to  {Arrow (?q,  u2)/t}cr1.  By  the  ea  typing  hypothesis, 

b  ea  :  \/t1,t2::Q.{t1/t}(j1  {Wfjcr  {Arrow(fltf2)/f}(Ti. 


Thus, 

b  ea[ui][u2]  :  {ui/f}cri  {u2/t}(Ji  ->■  {Arrow(ui,  u2)/t}(Ji. 

Therefore,  b  e'  :  {Arrow(ui,  u2)/t}cri,  and  b  e'  :  a'.  □ 

Lemma  4.2.11  (Canonical  Forms)  If  b  v  :  a  then: 

•  If  b  a  =  int,  t/ren  v  =  t  for  some  integer  i. 

•  If  b  cr  =  o'!  — >■  cr2,  f/ien  u  of  the  form  \x\ai.e,  for  some  x  and  e. 

•  7/b  a  =  \/t::K.a' ,  then  v  is  of  the  form  A tr.K.e,  for  some  e. 

Proof:  If  b  v  :  a,  then  there  is  a  normal  derivation  of  b  v  :  NF(a).  Backing  up  this 

derivation  by  one  rule,  it  is  easy  to  see  by  the  definition  of  values  that  only  one  rule 
applies.  □ 

Lemma  4.2.12  (Constructor  Progress)  If  b  Hi  ::  k,,  then  either  ji \  is  a  constructor 
value  u  or  else  there  exists  a  g>2  such  that  / \ — >  gt,2. 

Proof:  If  // 1  is  not  a  value,  then  there  is  a  unique  decomposition  into  U[fj]  for  some 

context  U  and  closed  constructor  /i.  I  argue  that  /i  must  be  an  instruction.  Since 
b  U[ii\  ::  k,  there  exists  a  k!  such  that  b  /i  ::  k' . 

If  /u,  is  of  the  form  /u2,  then  g.\  and  /i2  must  be  values  ui  and  u2  respectively.  Since 
b  u\u2  ::  C,  this  must  be  derived  using  the  app  rule.  Thus,  there  exists  a  such  that 
b  u\  ::  K\  — )■  and  this  must  be  derived  via  the  fn  rule.  Therefore,  u\  must  have  the 
form  Xtv.Ki.fi'  and  thus  u\  u2  i — >  {u2/t}u'. 

If  fr  is  of  the  form  Typerec  ///  of  (/i*;  iia),  then  ji!  must  be  a  value  u.  Since  b  /i  :: 
k,  a  derivation  of  this  fact  must  end  in  a  use  of  tree.  Hence,  b  u  ::  fl,  and  u  is 
either  Int  or  Arrow(ui,  u2).  In  the  former  case,  /v,  i — »  / i*  and  in  the  latter  case,  ji  \ — > 
tia [ui]  [u2\  (Typerec  vA  of  (//*;  fia)){ Typerec  u2  of  (/r*;  //„)).  □ 
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Lemma  4.2.13  (Expression  Progress)  If  he!  :  o,  then  either  ex  is  a  value  or  else 
there  exists  an  e2  such  that  e\  i — y  e2. 

Proof:  If  e1  is  not  a  value,  then  by  Unique  Decomposition,  there  exists  an  E  and  e 

such  that  e\  =  E[e\  and  e  is  either  stuck  or  else  e  is  an  instruction.  I  argue  that  e  must 
be  an  instruction  and  thus  there  exists  an  e'  such  that  e  \ — y  e'  and  thus  E[e\  i — y  e2 
where  e2  =  E[e'].  Since  b  E[e\  :  o,  there  exists  a  o'  such  that  b  e  :  o'. 
case:  If  e  is  of  the  form  ei  e2  then  both  e\  and  e2  must  be  values,  v\  and  v2.  Since 
b  v\  v2  :  o'.  there  is  a  normal  derivation  of  b  v\  v2  :  NF(o')  where  the  second  to  the  last 
step  uses  the  app  rule.  Hence,  there  is  a  o\  such  that  b  v\  :  o\  —y  o'  and  b  v2  :  o2.  A 
normal  derivation  of  b  rq  :  o1  —y  o'  must  have  a  second  to  last  step  that  uses  the  fn  rule. 
By  Canonical  Forms,  yq  must  be  of  the  form  A  x:oi,e".  Therefore,  V\V2  \ — y  {u2/x}yq. 
case:  If  e  is  of  the  form  e\  [/y]  then  e\  must  be  a  value  yq.  The  second  to  the  last 
step  of  a  normal  derivation  of  b  yq  [//]  :  o'  must  use  the  tapp  rule.  Thus,  there  exists 
some  k  such  that  b  n  ::  k  and  b  V\  :  Mtv.K.oi  where  {/ y/tjoq  is  equivalent  to  o'.  Any 
normal  derivation  of  b  yq  :  \/t::n.o1  must  have  a  second  to  last  step  that  uses  the  tfn 
rule.  By  Canonical  Forms,  V\  must  be  of  the  form  A twn.e" .  If  /v,  is  a  constructor  value  u, 
then  [v/]  i — y  {u/t}e" .  Otherwise,  by  constructor  progress,  there  exists  a  fi"  such  that 
/ i  i — >  id" .  Therefore,  [/y]  i — y  i:t  [C7[/yr/]] . 

case:  If  e  is  of  the  form  typerec  /y  of  [t.cri]  (e*;  ea),  then  by  constructor  progress  /y  is 
either  a  constructor  value  u  or  else  there  exists  a  /j'  such  that  /y  \ — y  /y'  and  thus  e  i — y 
typerec  ///  of  [f.<j1](ejjea).  If  /y  is  a  constructor  value  u,  then  any  normal  derivation  of 
b  e  :  o'  must  end  with  the  second  to  last  step  an  application  of  tree.  Hence,  b  u  ::  O. 
Therefore,  u  is  either  Int  or  Arrow (//] ,  u2)  for  some  u |  and  u2.  In  the  former  case,  e  i — y  el 
and  in  the  latter  case, 

ei — >  ea[?yi][yy2]  (typerec  ux  of  [t.o^fe^  ea))  (typerec  u2  of  [t.cr1](ei;  ea)). 

□ 


Corollary  4.2.14  (Stuck  Programs  Untypeable)  If  be:  o,  then  e  is  not  stuck. 

Proof:  If  b  e  :  a,  then  by  progress,  either  e  is  a  value  or  else  e  i — y  e'  for  some  e' .  If 

e  i — y  ft',  then  by  preservation,  b  e'  :  o' .  □ 

Corollary  4.2.15  (Soundness)  If\~e:o  then  e  cannot  become  stuck. 

Proof:  I  argue  by  induction  on  n  that  if  e  i — yn  e',  then  e!  is  not  stuck.  For  n  =  0,  the 
result  holds  by  the  previous  corollary.  Suppose  e  i — yn  e'.  By  the  induction  hypothesis, 
e'  is  not  stuck.  Hence,  e'  is  either  a  value  or  else  e'  \ — y  e" .  By  Preservation,  b  e"  :  o. 
Thus,  by  the  previous  lemma,  e"  is  not  stuck.  □ 


Chapter  5 

Compiling  with  Dynamic  Type 
Dispatch 


In  this  chapter,  I  show  how  to  compile  Mini-ML  to  a  variant  of  Xf!L,  called  Af^-Rep. 
The  primary  purpose  of  the  translation  is  to  give  a  simple,  but  compelling  demonstration 
of  the  power  of  dynamic  type  dispatch.  A  secondary  purpose  is  to  demonstrate  the 
methodology  of  type-directed  compilation  and  present  a  proof  of  translation  correctness. 

I  also  address  two  real  implementation  issues  with  the  translation:  first,  I  show  how  to 
eliminate  structural,  polymorphic  equality  by  using  a  combination  of  primitive  equality 
operations  and  dynamic  type  dispatch.  As  a  result,  the  target  language  does  not  need 
specialized  support  for  dispatching  on  values.  This  implies  that  Af^-Rep  does  not  need 
tags  on  values  to  support  polymorphic  equality. 

Second,  I  flatten  1-argument  functions  into  multiple  argument  functions  when  pos¬ 
sible.  Recall  that  Alini-ML,  like  SML,  provides  only  1-argument  functions.  We  can 
simulate  multiple  arguments  by  passing  a  tuple  to  a  function,  but  it  is  best  to  pass  multi¬ 
ple  arguments  directly  in  registers  since  access  to  registers  is  typically  faster  than  access 
to  an  allocated  object.  Therefore,  if  a  source  function  takes  a  tuple  as  an  argument,  I 
translate  it  so  that  the  components  of  the  tuple  are  passed  directly  as  multiple,  unallo¬ 
cated  arguments.  I  use  dynamic  type  dispatch  to  determine  whether  to  flatten  a  function 
that  takes  an  argument  of  unknown  type. 

In  practice,  flattening  function  arguments  yields  a  substantial  speedup  for  SML  code 
and  significantly  reduces  allocation.  Much  of  the  improvement  claimed  by  Shao  and 
Appel  for  their  implementation  of  Leroy-style  representation  analysis  is  due  to  argument 
flattening  [110].  In  particular,  they  reduced  total  execution  time  by  11%  on  average  and 
allocation  by  30%  on  average.  I  found  similar  performance  advantages  with  argument 
flattening  in  the  context  of  the  TIL  compiler  (see  Chapter  8). 

After  demonstrating  how  multi-argument  functions  and  polymorphic  equality  may 
be  implemented  in  Xf!L,  I  sketch  how  other  language  constructs,  notably  C-style  streets, 
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(kinds) 


k,  ::=  Q  |  Ki  — >■  /% 


(constructors)  // 


t  |  Int  |  Float  |  Unit  |  Prod(/ii,  /i2)  |  Arrow([/q,  •  •  • ,  //*],  //) 
At:: ac.  l-i  |  l-i \  l-i2  |  Typerec  /i  of  (//*;  ///;  //„;  //p;  iia ) 


(types) 


cr  ::=  T(/i)  |  int  |  float  |  unit  |  (01  x  <r2)  |  [ai,  •  •  • ,  crk]  — >■  cr2  |  X/twn.o 


(expressions)  e  ::=  x  \  i  \  f  \  ()  \  (ei,e2)  |  7T!  e  |  7t2  e  |  A [x.ai,  •  •  • ,  e 

e  [ei,  •  •  • ,  e/j]  |  A/::/,  .c  |  e[/i]  | 

eqint(ei,e2)  |  eqf loat(ei,  e2)  |  ifO  e\  then  e2  else 
typerec  /i  of  [t.cr]^;  ef;  eu;  ep;  ea) 


Figure  5.1:  Syntax  of  A;y/  -Kep 


Haskell-stvle  type  classes,  and  polymorphic  communication  primitives,  can  be  coded 
using  dynamic  type  dispatch.  The  rest  of  this  chapter  proceeds  as  follows:  In  Section 
5.1,  I  define  the  target  language,  Ay/'-Uep.  In  Section  5.2,  I  define  a  type-directed 
translation  from  Mini-ML  to  Afy-Rop  that  eliminates  equality  and  flattens  function 
arguments.  In  Section  5.3  I  prove  the  correctness  of  this  translation.  Finally,  In  Section 
5.4,  I  demonstrate  how  other  constructs  may  be  implemented  through  dynamic  type 
dispatch. 


5.1  The  Target  Language:  A  -Rep 

The  syntactic  classes  of  the  target  language,  Afy-Rep,  are  given  in  Figure  5.1.  Af^-Rep 
extends  \fL  by  adding  floats,  products,  and  Axargument  functions  (for  some  fixed,  but 
arbitrary  k.)  The  constructor  level  reflects  these  changes  by  the  addition  of  Float,  Unit, 
Prod(/ii,  /i2),  and  Arrow([/ii,  •  •  • ,  //*],  //)  constructors,  and  by  the  addition  of  arms  within 
Typerec  and  typerec  corresponding  to  these  constructors.  In  addition,  Af^-Rep  provides 
primitive  equality  functions  for  integer  and  floating  point  values  (eqint  and  eqfloat) 
as  well  as  an  if  0  construct.  However,  Af^-Rep  does  not  provide  a  polymorphic  equality 
operator. 

The  values,  evaluation  contexts,  and  rewriting  rules  of  Af^-Rcp  are  essentially  the 
same  as  for  XfL  (see  Section  3.3),  with  the  addition  of  the  product  operations,  the 
equality  operations,  and  ifO,  so  I  omit  these  details.  The  added  constructor  and  term 
formation  rules  for  Af^-Rep  are  given  in  Figures  5.2  and  5.3  respectively. 
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Technically,  all  Af^-Rep  functions  have  k  arguments,  but  I  use  Af^oy,  •  •  • ,  xn:an].  e 
where  n  <  k  to  represent  the  term: 

Afrpoy,  •  •  • ,  xn\on ,  Xjj+punit,  •  •  • ,  a:*: unit],  e 

when  xn+i,  •  •  •  $ck  do  not  occur  free  in  e.  Similarly,  I  write  e  [el5  •  •  •  ,.•%]  where  n  <  k 
to  represent  the  term  e  [e1;  •  •  • ,  en,  (}n+1,  •  *  • ,  ()&].  In  the  degenerate  case  where  n  =  1,1 
drop  the  brackets  entirely  and  simply  write  Arpoy.  e  and  ee1.  I  use  similar  abbreviations 
for  arrow  constructors  and  types. 

When  convenient,  I  use  ML-stvle  pattern  matching  to  define  a  constructor  involving 
Typerec  or  a  term  involving  typerec.  For  instance,  instead  of  writing 

F  =  Xtr.K. Typerec  t  of 

/ ij ,  /iM, 

Xti::Ll.Xt2::Ll.Xtl1::K.tl2::K.nP 
A ti::Q.  •  •  •  .Xtk.Xty.U.Xt^r.K.  •  •  •  .A t'k::n,  t::n.fia) 

I  write: 

F[l  nt]  =  Hr 

F  [Float]  =  / if 

F[Unit]  =  (j,u 

F[Prod(f1;  f2)j  =  {^[ti]/f1,F[h\/1^}fJ>p 

F[Arrow([fi,  •  •  •  ,tk],t)\  =  {F^j/fi,  •  •  • ,  F[4]/4,  F[t]/t'}iJ,a 
As  in  SML,  I  use  an  underscore  ( )  to  represent  a  wildcard  match. 

I  also  use  the  derived  form  Typecase  for  an  application  of  Typerec  where  the  inductive 
cases  are  unused.  Hence,  the  pattern  matching  constructor 

F[l  nt]  =  ^ 

F  [Float]  =  i if 

F  [U  nit]  =  Hu 

F[Prod(fi,  f2)]  =  {F[fr]/fi,  F[t2\/t'2}HP 

F [Arrow] [fi,  •  •  • ,  tk],  t) ]  =  {Ff^j/fi,  •  •  • ,  F [4]/4,  F [t]/t}Ha 

where  t\  and  t2  do  not  occur  free  in  Hp  and  l.\ ,  •  •  •  do  not  occur  free  in  Ha  nray  be 
written  using  Typecase  as  follows: 

Typecase  //  of 

Int  =>  Hi 
|  Float  =>  /if 
|  Unit  =>  Hu 
|  Prod(ti,t2)  =>  hp 
|  Arrow([ti,  •  •  • ,  4],  f2)  =>  Ha 

Similarly,  I  use  a  derived  typecase  expression  form  for  instances  of  typerec  where  the 
inductive  cases  cases  are  unused. 
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(unit)  A  b  Unit  ::  0 


(prod) 


A  b  Hi  ::  El  A  b  /ij  ::  D 
A  b  Prod(yUi,  H‘> )  ::  0 


A  b  Hi  ::  El  •  •  •  A  b  Hk  0 
A  b/j  ::  0 

(arrow)  - - - - - 

A  b  Arrow([/ii,  •  •  • ,  //*,],  /i)  ::  0 

Ab/i::fl  A  b  ft  ::  k 
A  b  [if  ::  ac  A  b  Hu  ::  ac 
A  I  fjjp  : :  0  — y  0  — y  k  — y  k  — y  k 
A  b  Harrow  ^1  — ^  ‘  ‘  ‘  — ^  Ok 

(tree)  -  (k15  •  •  • ,  Kk  =  K) 

A  b  Typerec  /i  of  (pp  Hp  HP  HE  To)  "  « 

Figure  5.2:  Added  Constructor  Formation  Rules  for  Af^-Rop 


5.2  Compiling  Mini-ML  to  A^^-Rep 

With  the  definitions  of  the  source  and  target  languages  in  place,  I  can  define  a  transla¬ 
tion  from  Mini-ML  to  Af'WRcp.  In  this  section,  I  carefully  develop  such  a  translation, 
concentrating  first  on  a  translation  from  Mini-ML  types  to  Af'WRcp  constructors.  Next, 
I  develop  a  term  translation  and  show  that  it  respects  the  type  translation.  Finally,  I 
prove  that  the  translation  is  correct  by  establishing  a  suitable  family  of  simulation  rela¬ 
tions  and  by  showing  that  a  well-formed  Mini-ML  expression  is  appropriately  simulated 
by  its  Af^-Rep  translation. 


5.2.1  Translation  of  Types 

I  translate  Mini-ML  monotypes  to  A7M/j-Rep  constructors  via  the  function  |r|,  which  is 
defined  by  induction  on  r  as  follows: 


\t 

|int 
|float 
|  unit 

|(Tl  x  r2) 
Wi  -P  t2 


t 

Int 

Float 

Unit 

<N  x  N) 

Vararg  \n  \  \r2 
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(prod) 


A;  F  F  e\  :  o\  F:  A  F  e2 
A;  F  h  (ei,e2)  :  (cq  x  a2) 


A;  F  F  e  :  (cq  x  <r2)  1  ox 

(proj )  -  fa  =  1,2 

A  I  r  7T,-  e  :  cn 


A;  T  h  ei  :  int  A;  T  F  e2  :  int 

(eqi)  — 1 - - - : - 

A;  T  h  eqint(ei,  e2)  :  int 


(eqf) 


A;  F  -  f:  i  :  float  A;  T  F  e2  :  float 
A;  T  h  eqfloat (ei,  e2)  :  int 


(abs) 


A;  \  -  t\  :  int  A;  T  F  e2  :  a  A;  F  F  e3  :  a 

A;  T  F  if  0  e\  then  e2  else  e3  :  cr 

A;  r  l±l  •  •  • ,  h  e  :  a 

A;  T  F  A ,  :rr, .  •  •  • ,  xk:ak\.  e  :  [crj,  •  •  • ,  ak\  ->■  a 


(app) 


A;  F  F  e  :  [cq,  ■  ■  ■ ,  ak]  -»■  cr 
A;  T  h  ex  :  (Ji  •  •  •  A;  F  F  ek  :  ak 
A;  F  h  e  [fq,  ■■■rek]:ak 


(tree) 


A  F  /i  ::  Ll  A;  T  h  e%  :  { I nt/t}cr 
A;  T  h  e/  :  {Float/t}a  A;  T  F  eu  :  {Unit/tjcr 
A;  T  F  ep  :  \/t1::D.\/t2::D.[{ti /t}a]  — *  [{t2/t}cr]  {Prod(fi,  ^2 ) /^}c 

A;  F  F  ea  :  Vti : :0.  •  •  •  Ntk::Q.\/t'::Q. 

[Oi A}a]  [{4AH  [O' AH  ~ ^ 

_ {Arrow([ii,  •  •  •  ,tk],t')/t}a _ 

A;  T  F  typerec  /i  of  [t.a^e^ef,  eu\  ep ;  eQ)  :  {/i/f}cr 


Figure  5.3:  Added  Term  Formation  Rules  for  Af^-Rep 
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The  translation  maps  each  type  to  its  corresponding  constructor  except  for  arrow  types: 
An  arrow  type  whose  domain  is  a  tuple  is  flattened  into  an  arrow  constructor  with 
multiple  arguments  by  the  Vararg  constructor  function.  Vararg  is  defined  using  Typecase 
as  follows: 

Vararg  = 

Typecase  t  of 

Prod(ti, t2)  =>  Arrow([fi,t2]A') 

|  _=>  Arrow(f, t '), 

and  has  the  property  that,  if  r  =  (ly  x  r2),  then 

Vararg  |r|  \r'\  =  Arrow([|ri|,  |r2|],  |'r/|). 

Alternatively,  if  r  is  not  a  product  (and  not  a  variable),  then  Vararg  does  not  flatten  the 
domain.  For  instance, 

Vararg  Int  \r'\  =  Arrow(lnt,  | t-7 | ) . 

In  effect,  Vararg  reifies  the  type  translation  for  arrow  types  as  a  function  at  the  constructor 

level. 

The  type  translation  is  extended  to  map  source  type  schemes  to  target  types,  source 
type  assignments  to  target  type  assignments,  and  source  kind  assignments  to  target  kind 
assignments  as  follows: 


|Vf1?  •  •  • ,  tn.r\  =  •  •  •  .Vfn::fhT(|r|) 

| T |  =  {.r:|T(:r)|  |  x  E  Dom( T)} 

|A|  =  {t::Q  |  t  E  Dom( A)} 

This  type  translation  has  the  very  important  property  that  it  commutes  with  substitu¬ 
tion.  This  is  in  stark  contrast  to  any  of  the  coercion-based  approaches  to  polymorphism, 
where  this  property  does  not  hold  and  a  term-level  coercion  must  be  used  to  mitigate 
the  mismatch.  In  some  sense,  my  type  translation  is  “self-correcting”  when  I  perform 
substitution,  because  the  computation  of  an  arrow  type,  whose  domain  is  unknown,  is 
delayed  until  the  type  is  apparent. 

Lemma  5.2.1  \{t' /P\t\  =  {|t,|/T}ItI- 

Proof:  By  induction  on  r.  □ 
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5.2.2  Translation  of  Terms 

I  specify  the  translation  of  Mini-ML  expressions  as  a  deductive  system  using  judgments 
of  the  form  A;  F  h  e  :  a  =>-  e'  where  A;  T  b  e  :  a  is  a  Mini-ML  typing  judgment  and  e'  is 
the  Af^-Rep  translation  of  e.  The  axioms  and  inference  rules  that  allow  us  to  conclude 
this  judgment  are  given  in  Figure  5.4. 

Much  of  the  translation  is  straightforward:  The  translation  of  variables,  integers,  and 
floating  point  values  is  the  identity:  the  translation  of  if  0  and  n  expressions  is  obtained 
by  simply  translating  the  component  expressions.  In  the  following  subsections,  I  present 
the  translation  of  equality,  functions,  application,  type  abstractions,  and  type  application 
in  detail. 


5.2.3  Translation  of  Equality 

An  equality  operation  is  translated  according  to  the  following  rule: 

A;  T  b  e\  :  r  =>-  e\  A;  F  h  e2  :  r  =>-  e2 
A;  F  h  eq(el5  e2)  :  int  =>■  peq[|T|][e^  e'2 


The  translation  uses  an  auxiliary  function,  peq,  that  can  be  coded  in  the  target  language 
using  typerec.  Here,  I  use  the  pattern-matching  syntax  to  define  such  a  function: 


peq[lnt] 

peq[Float] 

peq[Unit] 

peq[Prod(ta,t6)] 

peq  [t] 


A^pint,  r2:int].  eqint(xl5  x2) 

A[a:i:float,  ,r2:float].  eqf  loat(r:i,  x2) 

A[xi:unit,  x2:unit].  1 

A  [x1:T(Prod(ta,tb)):X2:T(Prod(ta,tb))]- 

if  0  peq^a]^!  x1,7r1  x2]  then  0  else  peq[f6][7r2  x1,  n2  x2\ 
A  [xi:T{t),x2:T{t)].  0 


Operationally,  peq  takes  a  constructor  as  an  argument  and  selects  the  appropriate  com¬ 
parison  function  according  to  that  constructor.  For  a  product  Prod(fa,  tb),  the  appropriate 
function  is  constructed  by  using  the  inductive  arguments  peq[fa]  and  peq[f{,]  to  compare 
the  components  of  the  product. 

Expanding  the  pattern  matching  abbreviation  to  a  typerec  yields: 


At::Q. typerec  t  of  [t.cx]  (e* ;  e y ;  eu;  ep]  ea) 
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(var)  A;  F  l±)  {x:cr}  hi:(74i 
(int)  A;  T  F  i  :  int  =4>  i 


(unit)  A;  T  h  ()  :  unit  =>■  () 
(float)  A;  T  F  /  :  float  =>  / 


(abs) 


A;  T  h  ei  :  r  4  A;  F  h  e2  :  r  =4>  e2 

(  0Q  )  - — 

A;  T  h  eq(ei,e2)  :  int  =>  peq[|7-|] [ei,  ef,] 

A;  T  F  ei  :  int  =>  e\  A;  F  F  e2  :  r  =>■  e2  A;  T  F  e3  :  r  =>•  e'3 

A;  F  F  if  0  e\  then  e2  else  e3  :  r  =4>  if  0  e3  then  e'2  else  e'3 

A;  F  F  ex  :  tx  =>  e\  A;  F  F  e2  :  r2  =»  4 
Pair  A;  F  F  (ei,  e2)  :  (iq  x  r2)  =>-  (ei,  e2) 

A;  T  F  e  :  (7i  x  t2)  =>  e' 

(Pr°j)  a  r  i  - — - “  (*  =  1,2) 

A;  i  F  7Tj  e  :  Tj  =>  7Tj  e 

A;  F  l±l  {:r:Ti}  F  e  :  r2  4  e' 

A;  T  F  Ax:rx,  e  :  iq  ->■  r2  =>  vararg[|r1|][|r2|](Aa::T(|ri|).  e') 
A:  F  -  fq  :  r,  >  r2  >  o'.  A;  V  F  e2  :  rq  =>  e'2 

aPP  A;T  F  eie2  :  r2  =>  (onearg[|ri|][|r2|]4)«2 

A;  F  F  v  :  a  =4>  v  A;  F  l±l  {;r:<r}  F  e  :  r  4  ef 


(def) 


A;  F  F  def  x\o  =  «  in  e  :  r  4  let  x:  a  =  ?/  in  e7 


A  F  ri  •  •  •  A  F  rn 
A;  T  F  v  :  Vfi,  •  •  • ,  tn.r  =>  r/ 

A;  T  F  u[ri,  •  •  •  ,r„]  :  {iq/fi,  •  •  •  ,Tn/tn}r  =>  «'[|ti|]  •  •  •  [|r„|] 

A  i+J  {ti,  •  •  • ,  tn};  F  F  e  :  r  4  e' 

A;  F  F  Ati,  •  •  • ,  tn.e  :  Vfi,  •  •  • ,  tn.r  Afi"fF  ■  ■  ■  .A tn::D.e' 


(tapp) 


(tabs) 


Figure  5.4:  Translation  from  Mini-ML  to  Af^-Rep 
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where  (eliding  some  kind  and  type  information) 

a  =  [T(t),  T(t)]  — >  int 
ei  =  A[;ri : int,  a^iint].  eqint(;ri,  X2) 
ef  =  :float,  ;r2:float].  eqf loat)^,  x2) 
eu  =  Afrrpunit,  :r2 :  unit] .  1 

ep  =  Ata.Atb.Xpeqa.Xpeqb.X[wi:T{Prod{ta,tb)),X2-T{Prod(ta,tb))]- 

ifO  peqa[7Ti  xi,  7Ti  x2]  then  0  else  peq6[7r2  x\,  7r2  x2] 
ea  =  Ati.  ■  ■  ■  Atk-At' .Xpeqt,  ■  ■  ■  Apeq2.Apeqr. 

A[^i:T(Arrow([t1,  •  •  • ,  4],  t')),  a:2:T(Arrow([tiy  •  •  • ,  4],  t'))].0 

From  this  definition,  it  is  easy  to  verify  that 

h  peq  :  Vt::Q.[T(t),  T(t)]  — )■  int. 

The  derivation  proceeds  as  follows:  By  the  tabs  rule  (see  Figure  5.4),  it  suffices  to  show 

0  F  typerec  t  of  [t .cr]  (e* ;  ej ;  eu;  ep\  ea)  :  o. 

This  follows  if  I  can  derive  the  preconditions  of  the  tree  rule  for  Af^-Rep  (see  Figure  5.3). 
For  instance,  I  must  show  that 


{ 7 : : ST } ;  0  h  e,  :  { I  nt / t}cr, 

which  follows  from  the  derivation  below: 

{7: :ST} ;  jTpint, ;r2:int}  F  x\  :  int  {:ri:int, :??2 : i nt}  F  x%  :  int 

{.ri:int,  .r2:int}  F  eqint(.Ti,  ^2)  :  int 
{t::Q}:  0  F  A[.ri:int,  x2:int].eqint (a:i,  x2)  :  {lnt/t}([T(t),  T(t)]  — >  int) 

The  other  cases  follow  in  a  similar  manner. 

Intuitively,  peq  implements  the  first  five  rewriting  rules  of  the  dynamic  semantics 
of  Mini-ML  (see  Figure  2.3),  but  it  does  so  by  dispatching  on  its  constructor  argument 
instead  of  the  shape  of  its  value  arguments. 

5.2.4  Translation  of  Functions 

There  are  three  cases  to  consider  when  translating  a  A-expression: 

1.  the  argument  type  is  known  to  be  a  tuple; 

2.  the  argument  type  is  int,  float,  unit,  or  an  arrow  type; 
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3.  the  argument  type  is  a  type  variable. 


In  the  first  case,  the  argument  is  a  tuple.  I  need  to  produce  a  function  that  takes 
the  components  of  the  tuple  directly  as  arguments.  I  translate  the  body  of  the  function 
under  the  assumption  that  the  argument  was  passed  as  a  tuple.  Then,  I  abstract  the 
arguments  appropriately.  However,  before  executing  the  body  of  the  function,  I  allocate 
a  tuple  and  bind  it  to  the  original  parameter,  x. 


A;  F  tbl  {^(ti  x  r2)}  b  e  :  r'  =>-  e' 

A;  F  b  Xx:(ti  x  t2),  e  :  (ly  x  t2)  — >  t'  => 
\[xi:T(\ti\),x2:T(\t2\)].  let  z:T(|(ti  x  r2)|)  =  (xux2)  in  e 


{xi,x2  &  Dom(Y )) 


It  is  easy  for  an  optimizer  to  replace  projections  i\iX  within  the  translated  body  of  the 
function  with  the  appropriate  argument,  x%.  When  the  tuple  is  used  only  to  simulate 
multiple  arguments,  the  variable  x  will  occur  only  within  such  projections.  Hence,  all 
occurrences  of  x  will  be  eliminated  by  the  optimizer,  and  the  binding  of  the  tuple  to  x 
will  become  “dead”  and  can  be  eliminated  altogether. 

I  leave  these  optimizations  out  of  the  translation  for  two  reasons:  first,  such  opti¬ 
mizations  make  reasoning  about  the  underlying  translation  more  difficult.  Second,  the 
optimizations  (projection  elimination  and  dead  code  elimination)  are  generally  useful  and 
could  be  applied  after  other  passes  in  the  compiler.  Hence,  for  the  sake  of  modularity  it 
is  best  to  leave  these  transformations  as  separate  passes  over  the  target  code. 

In  the  second  case,  the  argument  is  a  non-tuple  and  a  non-variable.  No  flattening 
need  occur  and  the  translation  is  straightforward: 

A;  F  l+l  {x:r}  b  e  :  t'  =4>  e' 

F  b  A x:t.  e  :  r  — >■  t'  =>  Xx:T(\r\).  e 


In  the  third  case,  the  argument  type  of  the  function  is  a  type  variable  (t).  If  this 
variable  is  instantiated  with  a  tuple  type,  then  the  function  should  be  flattened;  otherwise, 
the  function  should  not  be  flattened.  I  use  a  term-level  typecase  to  decide  which  calling 
convention  to  use.  To  avoid  duplicating  the  function  body  for  each  case,  I  borrow  an  idea 
from  the  coercion-based  approaches:  Pick  one  calling  convention,  compile  the  function 
using  this  convention,  and  for  each  case,  calculate  a  coercion  from  the  expected  to  the 
actual  calling  convention.  For  instance,  I  might  compile  the  function  as  if  there  was  one 
argument  of  type  t  and  then  use  typecase  to  calculate  the  proper  coercion  to  multiple 
arguments,  depending  on  the  instantiation  of  t.  This  leads  to  the  following  translation: 

A:  F  l±J  {x:t}  b  e  :  r2  =4>  e 

A;  F  b  A x:t.  e  :  t  — >■  r2  =>•  vararg[t][|r2|](A.r:T(t).  e  ) 
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where  the  term  vararg  is  defined  as  follows: 
vararg  =  A 

typecase  t  of 

Prod(t1,t2)  => 

A[.r:;/'(Prod(/|./2))  ->■  T(t,)].A[x1:T(t1),x2:T(t2)].x 
|  _=>  X[x:[T(t)]  ->  T(t')].x 

Expanding  the  pattern  matching  typecase  to  a  formal  typerec  yields 

typerec  t  of  [t.cr](e0[lnt];  e0[Float];  e0[Unit];  ep;  ea), 

where  (eliding  some  kind  and  type  annotations) 

a  =  [[T(t)j  — >•  T(t')]  — *  T(V ararg  t  t1) 
e0  =  At":-.n.\[[x:T{t")]  ->  T(t')].x 

ea  =  Ati.  ■  ■  ■  .Atk.At” .X[xi].  ■  ■  •  .A[a:fc].A[x,,].A[a::[T(Arrow([ti,  •  •  •  ,tk],t"))]  — >  T{t')].x 
ep  =  At i . At 2 •  A [:r ] . A [:r 2 ] . A [:r : [T ( P r o d ( 1 1 , 1 2 ) ) ]  -t  r(t,)].A[xi:T(ti), x2:T{t2)\.x  [(.ri,.r2)]- 

Notice  that  the  inductive  arguments  for  ea  (,rl5  •  •  • ,  and  x")  and  ep  {x\  and  x'2)  are 
unused.  It  is  straightforward  to  show  that  the  expansion  of  vararg  yields  a  well-formed 
term  with  type  'it::Q.Vt'::Q.cr. 

Lemma  5.2.2  h  vararg  :  Vt::n.Vt/::Q.(T(t)  — )■  T(t'))  — )■  (T(Varargtt/)) 

As  a  direct  result,  the  translation  of  functions  using  vararg  preserves  the  type  trans¬ 
lation. 

Lemma  5.2.3  If 

|  A|;  |T  l+J  {.r:ri}  |  h  e'  :  |r2|, 

then 

|A|;  | r |  h  vararg[|ri|][|r2|](Aj;:T(|ri|).  e)  :  |ti  ->■  r2|. 

In  Chapter  8,  I  show  that  for  most  SML  code  (and  I  conjecture  code  in  other  similar 
languages),  it  is  rarely  the  case  that  we  do  not  know  enough  information  about  the 
argument  type  at  compile  time  that  we  must  use  vararg  to  choose  a  calling  convention 
at  link-  or  even  run-time.  Therefore,  most  functions  will  be  translated  using  one  of  the 
first  two  translation  rules. 

Furthermore,  standard  optimizations,  such  as  compile-time  /3-reduction  for  construc¬ 
tor  abstractions,  can  eliminate  variable  types  at  compile  time  and  hence  eliminate  the 
need  for  vararg.  Indeed,  it  is  entirely  reasonable  to  translate  every  function  using  vararg 
and  allow  an  optimizer  “fix-up”  the  inefficiencies.  In  this  fashion,  vararg  reifies  the 
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monomorphic  term  translation  of  functions  in  the  same  way  that  Vararg  reifies  the  type 
translation. 

Since  the  source  language  does  not  have  first-class  polymorphism  and  the  scope  of 
a  polymorphic  value  is  constrained,  we  could  eliminate  all  polymorphism  at  compile 
time  and  thus,  all  occurrences  of  vararg.  However,  I  argued  earlier  that  eliminating  all 
polymorphism  at  compile  time  is  not  reasonable  since  it  duplicates  code  and  does  not  scale 
to  languages  with  first-class  polymorphism,  modules,  or  separate  compilation.  I  therefore 
leave  the  decision  to  inline  a  polymorphic  function  to  an  optimizer.  If  polymorphic 
functions  tend  to  be  small  or  the  number  of  uses  is  relatively  small,  then  a  reasonable 
strategy  is  to  inline  them  within  their  defining  compilation  unit. 

While  I  borrow  the  idea  of  a  coercion  to  mitigate  the  mismatch  in  calling  conventions, 
there  are  still  significant  differences  between  my  approach  and  the  approach  suggested  by 
Leroy  (see  Section  1.2.3).  First,  Leroy’s  coercions  can  always  be  calculated  at  compile¬ 
time  without  the  need  to  examine  types  at  run-time.  In  contrast,  vararg  may  need  to 
examine  constructors  at  run-time.  However,  I  argued  that  Leroy’s  approach  does  not 
scale  to  languages  like  XfL  that  have  first-class  polymorphic  values,  while  clearly,  my 
approach  can.  Second,  Leroy’s  S  and  G  coercions  recursively  pull  apart  any  polymorphic 
object  to  coerce  its  components  and  make  a  “deep”  copy  of  a  data  structure.  For  instance, 
if  we  have  a  vector  of  polymorphic  functions,  Leroy’s  coercions  will  traverse  and  create  a 
new  vector  when  the  unknown  types  are  instantiated.  In  contrast,  my  “shallow”  coercion 
only  affects  a  closure  as  it  is  constructed.  In  particular,  I  delay  the  construction  of  a 
vector  of  polymorphic  functions  until  the  unknown  types  are  apparent,  at  which  point 
the  appropriate  coercion  is  selected.  Hence,  no  copy  is  ever  generated  and  there  is  no 
coherency  issue  when  state  is  involved.  Finally,  while  I  use  a  coercion  to  mitigate  a 
mismatch  with  calling  conventions,  I  do  not  use  coercions  to  implement  all  language 
features  (see  for  instance  Section  5.4).  Thus,  typerec,Typerec  and  the  ideas  of  dynamic 
type  dispatch  support  coercions,  but  are  a  much  more  general  mechanism. 

5.2.5  Translation  of  Applications 

As  with  functions,  there  are  three  cases  to  consider  when  translating  an  application:  the 
argument  type  is  either  a  tuple,  a  non-tuple,  or  a  variable. 

If  the  term  is  e\  e2  and  the  argument  e2  has  a  tuple  type,  I  need  to  extract  the 
components  of  the  tuple  and  pass  them  directly  to  the  function.  I  translate  e1  and  e2, 
binding  the  resulting  expressions  to  ;ri  and  .r2  via  let.  Then,  I  project  the  components 
of  :r2  and  pass  them  to  x\. 

A;  T  h  e\  :  (ri  x  t2)  — )>  r  =4>  e,  A;  F  h  e2  :  (n  x  r2)  =>  e'2 
A;  T  h  e1  e2  :  r  =4> 

let  (-|  x  r2)  — >  r|)  =  ex  in  let  x2:T(\(t1  x  t2)|)  =  e2  in  x1  x2,  7t2  x2\ 
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Again,  an  optimizer  should  eliminate  unnecessary  projections.  In  particular,  if  e.\  is  a 
tuple  (ea,eft)  that  is  constructed  solely  to  pass  multiple  arguments,  then  optimization 
will  yield  the  simple  expression  e\  [ea,ei ,]. 

If  the  argument  has  a  non-tuple,  non-variable  type,  then  the  translation  is  straight¬ 
forward: 

A;  T  b  e\  :  r'  — >■  r  =>  e[ 

A;  F  b  e2  :  r'  =>■  e2 
A;  T  b  ei  62  :  r  =>  e\  e'2 

If  the  argument  type  is  a  type  variable  (t),  then  I  must  decide  whether  or  not  to 
flatten  the  argument  into  multiple  arguments  using  typecase.  The  following  onearg 
function  calculates  a  coercion  for  a  function,  deciding  whether  or  not  to  pass  the  argument 
flattened,  based  on  t: 

onearg  =  A 

typecase  t  of 

Prod ( / 1,  /2)  => 

A[/:[T(ti),T(t2)]  ->■  T{t')].X[x:T{ Prod(H,  f2))]./  [tt1x,tt2x] 

|  _=>A [f:[T(t)]^T(t')].f 

It  is  easy  to  verify  that  onearg  translates  functions  from  their  Vararg  calling  convention 
so  that  they  take  one  argument. 

Lemma  5.2.4  b  onearg  :  Vt::fl.Vt/::Q.[T(Varargf  t')\  — >■  [T(t)]  — >  T(t') 

Hence,  a  simple  translation  of  application  is  as  follows: 

A:  F  b  e\  :  — >■  r2  e\  A;  T  b  e2  :  rt  e'2 

A;  T  b  eie2  :  r2  =>  (onearg[|ri|][|r2|]  4)  4 

The  following  lemma  shows  that  this  translation  of  application  obeys  the  type  translation. 

Lemma  5.2.5  If  |A|;  |T|  b  e\  :  \t\  — )■  r2|  and  |A|;  |T|  b  e2  :  |ti|,  then  |A|;  |T|  b 
(onearg[|n|][|r2|]  e[)  e'2  :  |r2|. 

Again,  an  optimizer  can  inline  and  eliminate  the  call  to  onearg  when  |ti|  is  known 
and  can  then  decide  whether  or  not  to  flatten  e2.  Furthermore,  the  following  lemma 
shows  that  onearg  is  a  left-inverse  of  vararg.  This  gives  an  optimizer  the  opportunity 
to  replace 

onearg[/ii][/i2]  (vararg^] [/i2]  v) 

with  simply  v  even  when  the  argument  type  of  the  function  v  is  unknown.  Henglein  and 
Jorgensen  suggest  a  similar  approach  to  eliminate  excessive  Leroy-style  coercions  [65]. 
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Lemma  5.2.6  If  hu  :  T(fi')  — >  T(/i),  h  v'  :  T(/jf),  then 

(onearg[///][/i](vararg[/y/][/i]  v))  v'  if  vn 


iff  v  v1  JJ.  v". 

Proof:  By  induction  on  the  normal  form  of  ff .  □ 

5.2.6  Translation  of  Type  Abstraction  and  Application 

When  translating  a  def ,  I  translate  the  polymorphic  value  v  yielding  v'  and  bind  this 
using  let.  I  translate  an  instantiation  v [t|  ,  •  •  • ,  tti]  by  applying  the  translation  of  v  to  the 
constructors  generated  by  |7i|,  •  •  • ,  \rn\.  I  translate  polymorphic  variables  to  themselves 
and  I  translate  type-abstractions  to  constructor  abstractions. 

This  part  of  the  term  translation  may  appear  innocuous  at  first,  but  it  is  significant: 
Traditional  compilers  ignore  type  applications  and  type  abstractions  since  they  do  not 
use  dynamic  type  dispatch.  Hence,  they  pass  no  arguments  to  polymorphic  values  when 
they  are  instantiated. 

In  my  translation,  I  turn  a  type-abstraction  into  a  function  that  takes  constructors 
as  arguments  and  pass  the  appropriate  constructor  arguments  to  the  abstraction  at  run¬ 
time.  In  some  sense,  building  the  constructor  arguments  and  passing  them  at  run¬ 
time  is  the  “overhead”  of  dynamic  type  dispatch  because  I  must  do  this  whether  or  not 
the  abstraction  examines  the  constructors  via  typerec.  Within  a  compilation  unit,  an 
optimizer  may  determine  that  some  constructor  arguments  are  not  used  by  a  polymorphic 
function  and  modify  local  call  sites  so  that  they  do  not  build  or  pass  these  constructors 
at  run  time.  In  at  least  some  cases,  however,  type  application  will  require  building  and 
passing  constructors  at  run  time. 


5.3  Correctness  of  the  Translation 

From  the  lemmas  regarding  peq,  vararg,  and  onearg,  and  the  commutivity  of  type 
translation  with  substitution,  it  is  easy  to  show  by  induction  on  the  derivation  of  A;  T  h 
e  :  r  =4>  e'  that  the  term  translation  preserves  types  modulo  the  type  translation. 

Lemma  5.3.1  If  A;  T  b  e  :  a  =>•  e' ,  then  |A|;  |F|  h  e'  :  \a\. 

To  prove  the  correctness  of  the  translation,  I  want  to  assert  that  a  Mini-ML  expression 
terminates  with  a  value  iff  its  translation  terminates  with  an  “equivalent”  value.  Equiv¬ 
alence  of  values  is  easy  to  define  at  base  types  -  syntactic  equality  will  do  nicely.  But 
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what  should  be  the  definition  of  equivalence  for  arrow  types?  I  need  a  notion  of  seman¬ 
tic  equivalence  that  captures  the  idea  that  functions  are  equivalent  when  they  compute 
equivalent  answers,  given  equivalent  arguments. 

It  seems  as  though  I  am  stuck:  To  determine  whether  a  Mini-ML  expression  and  its 
translation  compute  equivalently,  I  must  define  what  it  means  for  Mini-ML  and  A;'?/'-l!ep 
values  to  be  equivalent.  But  to  define  what  it  means  for  two  values  to  be  equivalent, 
in  particular  what  it  means  for  two  functions  to  be  equivalent,  I  need  to  define  what 
it  means  for  two  expressions  to  compute  equivalently.  How  can  I  formulate  these  two 
relations  that  are  defined  in  terms  of  one  another? 

The  answer  to  this  dilemma  is  to  simultaneously  define  these  simulation  relations, 
but  index  the  relations  by  types.  I  will  start  by  defining  value  equivalence  and  expression 
equivalence  at  closed,  base  types  and  then  logically  extend  the  notions  of  equivalence  for 
higher  types  in  terms  of  the  equivalence  relations  indexed  by  the  component  types.  In 
the  end,  I  will  generate  a  family  of  inductively  defined  relations  which  will  allow  us  to 
argue  by  induction  on  the  type  of  an  expression  that  its  translation  is  correct. 

I  begin  by  defining  an  auxiliary  relation  between  closed,  Mini-ML  monotypes  and 
closed  Af^-Rep  constructors  that  respects  constructor  equivalence: 

.\t\  =  p!  and  b  //  =  //  :: 

T  ~  /i 

In  Figure  5.5,  I  give  suitable  relations  between  closed  Mini-ML  and  Af^-Rep  terms.  The 
relations  are  indexed  by  closed,  Mini-ML  type  schemes.  Two  computations  are  related 
if,  whenever  one  evaluates  to  a  value,  then  the  other  evaluates  to  a  related  value.  Two 
values  are  related  at  base  type  if  they  are  syntactically  equal.  Two  values  are  related  at 
a  product  type  if  projecting  the  corresponding  components  yields  related  computations 
at  the  component  type.  Two  values  of  arrow  type  are  related  if,  whenever  we  have 
values  related  at  the  domain  type,  applying  the  functions  to  the  values  yields  related 
computations.  In  the  case  of  the  Af^-Rep  function,  we  must  first  coerce  the  function 
to  take  one  argument  via  the  onearg  function.  Finally,  values  are  related  at  the  type 
scheme  Vfi,  •  •  •  ,tn.r,  if,  whenever  applied  to  closed,  related  monotypes  and  constructors, 
they  yield  related  computations  at  the  type  obtained  by  substituting  the  monotypes  for 
the  type  variables. 

The  relations  e  e' ,  v  ~T  v',  and  v  K,a  v'  are  well  founded  even  though  their 
definitions  refer  to  one  another,  because  either  the  size  of  the  type  index  decreases,  or 
else  the  number  of  quantifiers  in  the  type  index  decreases.  This  is  ensured  because 
Mini-ML  is  predicative,  (i.e. ,  only  monotypes  can  instantiate  type  variables). 

The  monotype/constructor  relation  is  extended  to  substitutions  S  and  S',  indexed  by 
a  set  of  type  variables,  A,  where  h  maps  type  variables  to  closed,  Mini-ML  monotypes 
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Expression  Relations: 


e  ^  v  iff  e1  JJ.  v1  and  v  v' 


Value  Relations: 


1  ~int  1  f  ~float  f  0  ~unit  0 

7Ti  V  ~Tl  7Ti  v'  7T2  U  ^t2  7r2  ^ 

'U  ~(ri  xt2)  V 

V'L’i,  UpUi  r^q-f  v[  implies  v  v\  <~^T  (one arg [ | ^ | ] [ | 'r | ]  v')  (rq) 

v  ~T/_^T  v 

Vri,  /ii,  •  •  • ,  rn,  fin.T\  i^i,  ■  ■  ■ ,  rn  //„  implies 

‘  ‘  ‘  5  Tn]  ^{ri/0,-",r„/rn}r  ^  [/R]  '  '  ‘  [/^n] 


Figure  5.5:  Relating  Mini-ML  to  Af^-Rep 


and  V  maps  type  variables  to  closed,  Af^-Rep  constructors  as  follows: 

Dom(d)  =  Dom(S')  =  Dom(  A) 

Vf  e  A.5(t)  V(t) 

(5  V 

The  term  relation  is  extended  to  pairs  of  substitutions,  7  and  V;  7',  indexed  by  A;  F, 
where  5  and  A'  are  as  above  and  7  and  7'  are  substitutions  from  term  variables  to  values. 
I  assume  that  all  free  type  variables  occurring  in  the  range  of  F  are  in  A. 

5  ftiA  A  h  T  Dom( 7)  =  Dom( j)  =  Dom(Y) 

V:r  G  Dom{T).S(^(x))  &r(x)  Y(x) 

<*;  7  ~A;r  7 

With  these  definitions,  I  can  begin  to  establish  the  correctness  of  the  term  translation. 
The  first  step  is  to  show  that  peq  has  the  appropriate  behavior. 

Lemma  5.3.2  If  rj  k,t  v[  and  v2  ~T  v'2,  then  eq(ui,u2)  ^jnt  peq[|r|][u(,  v'2\. 

Proof:  By  induction  on  r.  □ 

Next,  I  argue  that,  under  appropriate  circumstances,  abstracting  related  values  with 
respect  to  related  expressions  yields  related  functions.  This  follows  almost  directly  from 
the  definitions  of  the  relations  and  the  fact  that  onearg  and  vararg  are  left-inverses. 
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Lemma  5.3.3  Suppose  A;  Y  l+J  {x:t'}  h  e  :  t  =>  e' ,  and  for  all  S ;  7  l±)  {:r=u}  ~A;rtti{x:T'} 
S';  7'  l±l  {.r=t/},  <5(7  l±l  {;r=u}(e))  ^s(T)  S'( 7'  l±)  {^=t,/}(e/)).  Then  for  all  5;  7  ~A;r  <5';y, 
S('y{\x:T,.e))  ^(7,(vararg[|r/|][|r|](A;r:T(|r/|).e,))) 

Proof:  Let  (5;  7  ~A;r  <5r:  7'  and  let  v  ~s(rr)  v> ■  I  must  show: 

((5(7(A:r:r,.e)))  v  ~5(r)  (onearg[(5,(|r/|)][(5/(|r|)](^,(7/(vararg[|r/|][|T|](A.r:T(|r/|).e,)))))  u'. 
This  holds  iff: 

(Ax:S(r').S(j(e)))  v  ~5(y.)  (onearg^'dr'DP'drl)] 

(vararg[(5'(|rd)][(5'(|r|)](A^:T((5'(|rd)).(5'(7/(e')))))t;'. 

Since  onearg  is  a  left-inverse  of  vararg  (see  lemma  5.2.6),  this  holds  iff: 

(A*:<5(r').<5(7(e)))  «  ~,(t)  (Az^It'I)).^^)))  1/ 


which  holds  iff: 

(5(7  y  (x- u}(e))  ~j(T)  <5,(7/  t±l  {.r=u,}(e')) 

which  follows  by  assumption.  □ 

Finally,  I  establish  the  correctness  of  the  translation  by  showing  that  applying  related 
substitutions  to  an  expression  and  its  translation  yields  related  computations. 

Theorem  5.3.4  //A;Fhe:(7  4e'  and  5;  7  ~A;r  S';  7',  then  S(j(e))  rs"',<5(cr)  <5'(7'(e'))- 

Proof:  By  induction  on  the  derivation  of  A:  T  h  e  :  r  7  e'.  Let  S;  7  ~A;r  S';  y.  The 

int,  float,  and  unit  cases  follow  trivially.  The  var  case  follows  from  the  assumptions 
regarding  S;  7  and  S';  7'.  The  equality  case  follows  from  lemma  5.3.2.  The  ifO  case 
follows  since  related  values  at  int  must  be  the  same  integer.  The  pair  case  follows  from 
the  inductive  assumptions  and  the  proj  case  follows  from  the  definition  of  the  relations 
at  product  types.  The  abs  case  follows  from  lemma  5.3.3  and  the  app  case  follows  from 
the  definition  of  the  relations  at  arrow  types.  The  tapp  case  follows  from  the  definition 
of  the  relations  at  type  schemes,  and  the  fact  that  S(r)  &  <5' ( | tv| )  by  lemma  5.2.1.  □ 

Corollary  5.3.5  (Translation  Correctness)  Ifh  e  :  a  =7  e1 ,  then  e  e’ . 

Proof:  Suppose  0;  0  b  e  :  a  =7  e' .  Taking  S  —  S'  =  0  and  7  =  7'  =  0,  we  have 

S;  7  fti0;0  S ';7'.  By  the  previous  theorem,  then  <5(7(e))  ^s(a)  <5,(7,(e/)),  thus  e  e' . 

□ 

To  summarize,  I  have  defined  a  translation  from  Mini-ML  to  Af^-Rep  that  eliminates 
polymorphic  equality  and  flattens  function  arguments.  The  type  translation  uses  Typerec 
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to  define  Vararg,  which  determines  calling  conventions  for  functions.  The  term  translation 
uses  typerec  to  define  peq,  vararg,  and  onearg.  The  vararg  term  converts  a  function 
from  taking  one  argument  so  that  it  has  the  proper  calling  convention  according  to  Vararg. 
Conversely,  the  onearg  term  converts  a  function  from  its  Vararg  calling  convention  to  one 
argument. 

A  real  source  language  like  SAIL  provides  n-tuples  (for  arbitrary  n)  instead  of  only 
binary  tuples.  Extending  the  flattening  translation  so  that  it  flattens  argument  tuples  of 
less  than  k  components  is  straightforward  -  we  simply  use  k  +  1  cases  in  the  definitions 
of  Vararg  to  determine  the  proper  calling  convention: 

Vararg  =  XtwPL.Xt'wkl. 

Typecase  t  of 

Prod(fi)  =>  Arrow(ti,i') 

|  Prod(fi,f2)  =>  Arrow([ii,  i2],  t') 
j  Prod(fi,  ts)  =>  Arrow!  Jii  C-  0  -  >') 

|  Prod(ti,  •  •  •  ,4)  =>  Arrow([£i,  •  •  •  ,tk\,t') 

|  _  =>  Arrow(f,  t') 

Similarly,  the  definitions  of  vararg  and  onearg  will  require  k  +  1  cases. 


5.4  Compiling  Other  Constructs 

Dynamic  type  dispatch  can  be  used  to  support  a  variety  of  language  mechanisms  in 
the  presence  of  unknown  types.  In  this  section,  I  show  how  the  dynamic  type  dispatch 
facilities  of  XfL  can  be  used  to  support  flattened  data  structures  (such  as  C-stvle  structs 
and  arrays),  Haskell-style  type  classes  [53],  and  polymorphic  communication  primitives. 

5.4.1  C-style  Structs 

Languages  like  C  provide  flattened  data  structures  by  default.  Programmers  explicitly 
specify  when  they  want  to  use  pointers.  This  gives  programmers  control  over  both  sharing 
and  data  layout.  For  example,  a  C  struct  (i.e. ,  record)  with  nested  struct  components 
such  as 


struct  { 

struct  {int  x;  double  y ; }  a; 
int  b; 

struct  {double  f;  double  g ; }  c; 
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is  typically  represented  in  the  same  way  as  a  flattened  struct  made  out  of  the  primitive 
components  (ignoring  alignment  constraints): 

struct  { 
int  x; 
double  y; 
int  b; 
double  f; 
double  g; 

}• 

In  effect,  a  C  compiler  performs  a  type-directed  translation  that  eliminates  nested  structs. 
To  perform  this  flattening,  a  C  compiler  relies  upon  there  being  no  unknown  types  at 
compile  time.  In  this  section,  I  show  how  to  use  dynamic  type  analysis  to  flatten  structs 
in  the  presence  of  unknown  types. 

I  begin  by  extending  Af'^-Rep  to  support  records  with  an  arbitrary  number  of  prim¬ 
itive  components.  To  this  end,  I  add  list  kinds  (/c*)  with  introductory  constructors  NilK 
and  Cons(/ii,  p,2),  and  an  eliminatorv  constructor  Listrec  /rof  (/jn;  /ic).  Similar  to  Typerec, 
the  Listrec  constructor  provides  a  means  for  folding  a  computation  across  a  list  of  con¬ 
structors.  I  then  replace  Unit  and  Prod(/ii,  /u2)  with  a  single  constructor  Struct(/i),  where 
/i  is  a  constructor  of  kind  Q*  (i.e. ,  a  list  of  monotypes).  As  before,  we  can  generate  the 
monotypes  by  induction:  but  for  Structs,  we  require  a  dual  induction  to  generate  lists  of 
monotypes.  Therefore,  I  extend  the  Typerec  constructor  to  provide  a  means  for  folding 
a  computation  across  the  list  of  Q  components  of  a  Struct  constructor.  The  resulting 
grammars  for  kinds  and  constructors  are  as  follows: 

(kinds)  k  ::=  •••(«:* 

(constructors)  /u  ::=  •••  |  NilK  |  Cons(/ii,  /u2)  |  Struct  (//)  | 

Listrec  n  of  (//„;  /ic)  \ 

Typerec  //  of  (//*;  /if,  /v,s;  na)(l^n]  w) 

The  formation  rules  for  the  constructors  are  straightforward  except  for  Listrec  and  Type¬ 
rec.  A  Listrec  is  well-formed  with  kind  k,  provided  its  argument  is  a  list,  and  it  maps 
an  empty  list  to  a  constructor  of  kind  K\  and  a  non-empty  list  to  a  constructor  of  kind 
k | ,  given  the  head  and  tail  of  the  list  as  arguments,  as  well  as  the  result  of  folding  the 
Listcase  across  the  tail  of  the  list. 

A  h  /i  ::  Ki* 

A  h  nn  ::  k  Ah  / ic  ::  K\  — >  K\*  — >  k  — )■  k 

A  b  Listrec /i  of  (/in;  fic)  ::  k 
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The  formation  rule  for  Typerec  is  as  before,  but  we  require  extra  constructors  Hn  and 
\ic  in  order  to  fold  the  Typerec  across  the  list  components  of  a  Struct.  In  effect,  we 
simultaneously  define  a  Typerec  with  a  Listrec.  Therefore,  the  formation  rule  for  Typerec 
is  as  follows1: 

A  b  /i  v.  Ll  Ah  Hi  ::  k  Ah  Hf  ■■  « 

A  I  Ha  •  •  -  “  — y  CL  — y  k  — y  k  — y  k  A  I  f  i ,  : :  k  — y  k 
A  b  Hn  A  b  hc  ■  •  CL  — t"  CL  — y  k  — y  n  — y  /t 
A  b  Typerec  h  of  {LA  LA  Ls)(LA  Lc)  ::  K 

The  Hn  and  He  clauses  determine  how  the  Typerec  computation  is  folded  across  the 
components  of  a  Struct,  resulting  in  a  constructor  of  kind  k'.  The  Hs  clause  simply 
converts  this  k'  constructor  to  a  constructor  of  kind  k. 

The  equivalences  that  govern  Listrec  and  Typerec  are  as  before:  We  choose  the  appro¬ 
priate  clause  according  to  the  head  of  the  normal  form  of  the  argument,  and  unroll  the 
computation  on  the  component  constructors.  For  a  Struct,  we  unroll  as  follows: 

Typerec  Struct (h)  of  {ha  Hf  \  LA  Ls){la  Lc )  = 

Hs  (Listrec  /i  of  (la  {Xta::CL.Xti,::CL*  .Xt'b::n' . 

Hctatb( Typerec  ta  of  [ha  Hf,  LA  Ls){LA  Lc))t'b))) 

To  the  types,  I  add  struct-foy,  •  •  • ,  an}  for  n  >  0,  as  well  as  the  following  equivalence 
relating  Struct  constructors  and  struct  types: 


A  b  Hi  ::  hL  •  •  •  A  b  Q 

A  b  T(Struct(Cons(/ii,  •  •  •  Cons(/in,  Niln))))  =  struct{T(/ii),  •  •  • ,  T(ytn)} 

To  the  terms,  I  first  add  listrec  so  that  we  may  fold  a  term-level  computation  across 
a  list  of  constructors: 


(lrec) 


A  b  h  K*  A;  T  b  en  :  {NilK/t}cr 
A;  F  b  ec  :  ■{t2/t}cr  -A  {Cons(ti,  t2)/t}a 

A;  T  b  listrec  h  °f  [t'-'-K* .<r\{en]  ec)  :  {n/t}a 


As  I  did  at  the  constructor-level,  I  extend  typerec  so  that  we  simultaneously  define  how 
to  fold  a  term-level  computation  across  types  and  across  lists  of  type: 


(tree) 


A  b  /i  ::  Q  A;  F  b  e*  :  {lnt/f}cr  A;  T  b  ef  :  {Float/t}a 
A;  F  b  ea  :  \/ti::CL.\/t2::CL. {ti /t}a  -A  {t2/t}a  -A  {Arrow(ti,  t2)/t}a 
A;  F  b  es  :  fti\\CL*.{ti/t'}a'  -A  {Struct(ti)/t}cr 
A;  T  b  typerec  h  of  [t::CL.a]{eA  ef  ea;  es)[t'::CL*.a']{en-,  ec)  :  {/i/fjcr 


JTo  simplify  the  presentation,  I  only  consider  single  argument  functions  at  the  term  level,  and  hence 
Arrow  takes  one  domain  constructor  instead  of  k.  It  is  straightforward  to  extend  Arrow  to  take  and 
return  an  arbitrary  number  of  arguments  and  results  by  giving  it  kind  0*  — t  SI  *  —y  SI. 
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I  also  add  two  sorts  of  intro  and  elim  forms  for  structs.  The  first  sort  provides  both  an 
efficient  mechanism  for  constructing  structs  ( struct {el5  •  •  •  ,en})  and  an  efficient  mech¬ 
anism  for  projecting  a  component  from  a  struct  (#ie).  The  typing  rules  for  these  terms 
are  standard: 


(struct) 


A;  Fhej  :  ox  •  •  •  A;  T  b  en  :  on 
A;  T  h  struct  {e1;  •  •  • ,  c.„  }  :  structlcq,  •  •  • ,  cjn  } 


(n  >  0) 


(select) 


A;  T  h  e  :  struct{oq,  •  •  • ,  an] 
A;  T  h  #i  e  :  eq 


(1  <  i  <  n ) 


However,  these  terms  do  not  provide  a  means  for  constructing  or  deconstructing  a  struct 
by  induction  on  the  list  of  component  types.  Consider,  for  example,  extending  the 
polymorphic  equality  term  of  the  previous  section  to  compare  arbitrary  structs: 


peq[lnt]  =  A[aq:int,  :c2 : int] .  eqint(aq,  x2) 

peq[Float]  =  Afaqdloat,  :r2:float].  eqf loat(:rl5  x2) 
peq[Struct(t)]  =  A[2q:T(Struct(t)),  :r2:T(Struct(f))].  ??? 


I  need  to  project  the  components  of  the  structure  and  compare  them  at  their  respective 
types.  I  cannot  use  select  since  both  i  and  n  must  be  determined  at  compile  time,  and 
the  length  of  the  list  of  constructors  t  is  unknown.  I  therefore  add  a  second  sort  of  intro 
and  elim  forms  that  allows  us  to  construct  (cons^,  e2))  and  deconstruct  structs  (heade 
and  taile)  by  induction  on  the  list  of  components.  These  terms  have  the  following 
formation  rules: 

A:  I'  -  c,  :  '/'(// 1 )  A;  F  h  e2  :  '/'( Struct! //2 ) ) 

( COHS )  - 

A:  T  h  cons(e!,  e2)  :  T(Struct(Cons(p,i,  /i2))) 


(hd) 


A;  T  h  e  :  T(Struct(Cons(/Ui,  yu2))) 
A;  T  h  heade  :  T(fi i) 


A;  T  h  e  :  T(Struct(Cons(/ii,  /i2))) 

A;  T  b  tail  e  :  T(Struct(/i2)) 

Operationally,  cons  takes  values  v  and  struct  { /  |,  •  •  • .  i:„ }.  and  constructs  the 
new  value  structju,  rq,  •  •  •  ?  vn}.  Correspondingly,  head  and  tail  take  a  value 
structl'Ci,  v2,  •  •  • ,  vn }  and  return  values  rq  and  struct{n2,  •  •  • ,  vn },  respectively. 

It  is  possible  to  effectively  define  the  other  struct  primitives  with  cons,  head,  and 
tail2.  For  example,  struct{ei,  e2,  •  •  • ,  en }  can  be  defined  as 


structjei,  e2,  •  •  • ,  en}  =  cons(ei,  cons(e2,  •  •  •  cons(en,  structQ)  •  •  •)), 

2The  encoding  is  not  quite  complete,  because  cons,  head,  and  tail  only  operate  on  structs  of 
monotypes,  whereas  the  other  operations  can  operate  on  structs  of  polytypes. 
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whereas  #i  e  can  be  defined  as 


#1  e  =  heade 

#i  e  =  #  (i— 1 )  (tail  e) 

However,  these  encodings  generate  many  intermediate  structs  that  are  simply  discarded. 
Some  implementation  strategies  may  avoid  creating  most  if  not  all  of  these  structs.  For 
instance,  in  C,  the  tail  operation  can  be  implemented  by  returning  the  address  of  the 
second  component  of  a  struct.  Unfortunately,  many  memory  management  strategies 
forbid  pointers  into  the  middle  of  objects.  TIL  implements  tail  by  returning  a  logical 
pointer  or  cursor  instead  of  an  actual  pointer  (see  Chapter  8).  The  logical  pointer  is 
implemented  as  a  pair,  which  consists  of  a  pointer  to  the  beginning  of  the  original  struct 
and  an  integer  offset.  Adding  the  offset  to  the  pointer  yields  the  logical  pointer.  This 
approach  is  compatible  with  most  memory  management  strategies  and  provides  an  0(1) 
tail.  Unfortunately,  this  still  makes  projecting  the  ith  component  an  0(i )  operation. 
Therefore,  I  use  the  select  operation  (#ie)  for  0(1)  access  whenever  possible,  and  only 
use  head  and  tail  when  iterating  across  a  struct.  Likewise,  I  use  structjei,  •  •  • ,  enj 
whenever  possible  to  avoid  creating  any  intermediate  structs,  and  only  use  cons  when 
necessary. 

With  these  additions  to  the  target  language,  I  can  now  define  a  translation  that 
maps  Mini-ML  tuples  to  flattened  structs.  Of  course,  flattening  all  tuples  is  not  a  good 
strategy,  but  the  source  language  can  provide  two  sorts  of  tuple  types  —  those  that  should 
be  flattened  and  those  that  should  not  —  and  hence  leave  the  choice  of  representation 
to  the  programmer  (as  in  C).  Alternatively,  a  compiler  may  perform  some  analysis  to 
determine  a  representation  that  is  likely  to  be  beneficial.  To  demonstrate  the  key  ideas, 
I  will  simply  assume  that  all  products  are  to  be  flattened. 

The  type  translation  is  as  before  (see  Section  5.2.1),  except  for  unit  and  products. 
The  translation  of  unit  is  simply  an  empty  structure.  In  the  translation  of  a  tuple  type, 
I  use  two  auxiliary  constructor  functions,  Flat  and  Append: 

|  unit  |  =  Struct(Nilo) 

|(tiXt2)|  =  Struct  (Append  [Flat  [|ti|]]  [Flat  [|t2|]]) 

These  two  functions  are  defined  using  the  pattern  matching  notation  for  Typecase  and 
Listrec  as  follows:  Flat  takes  a  monotype  and,  if  it  is  a  Struct,  returns  the  list  of  com¬ 
ponents  of  the  Struct.  Otherwise,  Flat  conses  the  given  monotype  onto  nil  to  create  a 
singleton  list: 


Flat  ::  Q  —>■  Q* 

Flat  =  Atr.fi. Typecase  t  of  Struct(t)  =>  t 

|  _  =>  Cons(t,  Niln)) 
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Append  takes  two  lists  of  monotypes  and  returns  their  concatenation: 

Append  ::  Ll*  — *  Q*  — *  Q* 

Append  [Nilo]  [^]  =  t 

Append[Cons(fi,  f2)][f]  =  Cons(ti,  Append [£2] [£]) 

Therefore,  the  type  translation  of  a  product  results  in  a  struct  where  the  structs  resulting 
from  nested  product  components  have  been  flattened  and  appended  together. 

The  term  translation  is  as  before  except  for  tuple  creation  and  projection.  Unit  simply 
maps  to  an  empty  struct.  Binary  tuples  are  translated  according  to  the  following  rule: 

A;  T  b  ei  :  Ti  =>  e\  A;  T  h  e2  :  ly  e'2 
A:  T  h  (e1;  e2)  :  (n  x  r2)  =>  append[Flat|r1|][Flat|r2|](flat[r1]  e\)  (flat[r2]e2) 

The  translation  uses  auxiliary  term  functions  flat  and  append,  which  are  defined  using 
typecase  and  listrec  as  follows:  The  flat  function  takes  a  constructor  t  and  a  value 
of  type  T(t).  If  the  value  is  a  structure,  it  simply  returns  that  value.  If  the  value  is  not 
a  structure,  then  flat  places  it  in  a  structure. 

flat  :  Vf::fhT(i)  — *  T(Flat[f]) 

flat  =  At::D. typecase  t  of 

Struct  (t)=>A;r:T(Struct(f)).  x 
|  _  =>\x:T(t). struct {.r} 

The  append  function  takes  two  structs  and  concatenates  their  contents,  yielding  a  flat¬ 
tened  struct: 

append  :  Vti.M* .T(Struct(ti))  — >  T( Struct(f2))  T(Struct(Append[fi][f2])) 

append[Nilo][t]  =  Ax:struct{}.A|/:T(Struct(f)).y 

append[Cons(fi,  t2)][t]  =  A.r:T(Struct(Cons(ti,  f2)))-Ay:T(Struct(f)). 

cons(head:r,  append[t2][f]  (tailx)  y ) 

We  can  easily  verify  from  the  types  of  flat  and  append  that  the  term  translation  for 
products  respects  the  type  translation. 

The  term  translation  of  first  and  second  tuple  projections  is  given  by  the  following 
two  inference  rules: 


A;  T  h  e  :  (r\  x  t2)  =^>  e 

A;  T  h  7Ti  e  :  ri  unflat  |ri|(projl[Flat|ri|]  [Flat|r2|]  e) 
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A;  F  h  e  :  (ly  x  r2)  =>  e' 

A;  T  h  7t2  e  :  =>  unflat  |r2|(proj2[Flat|r1|]  [Flat|r2|]  e') 

These  translations  use  the  auxiliary  term  functions  unflat,  projl  and  proj2,  which  are 
defined  as  follows:  The  unflat  function  is  the  inverse  of  flat.  It  has  type 

unflat  :  Vf::fhT(Flat[f])  — *  T(t), 


and  is  defined  as: 

unflat  =  At::Q.typecase  t  of 

Struct(f)  =>  \x:T(Struct(t)).x 
|  _  =>  A:r:T(Struct(f)).head:r 

The  projl  function  extracts  the  first  components  of  a  struct,  corresponding  to  its  first 
argument  list  of  constructors.  Similar  to  append,  projl  is  defined  using  the  term-level 
listrec: 

projl  :  Vfi::fi*.Vi2::Q*.T(Struct(Append[fi][i2]))  — >■  T(Struct(A)) 

pro  j  l[Niln][f]  =  A;r:T(Struct(f)).struct{} 

projl[Cons(fi,f2)][f]  =  Ax:T(Struct(Append[Cons(fi,  f2)][f])). 

cons  (head  x,  proj  l[f2][f](tail  x)) 

The  proj2  function  extracts  the  latter  components  of  a  struct,  corresponding  to  its 
second  argument  list  of  constructors. 

proj2  :  Vfi::fi*.Vf2::0*.T(Struct(Append[fi][f2]))  — *  T(Struct(f2)) 

proj2[Niln][f]  =  Xx:T(Struct(t)).x 

proj2[Cons(fi,f2)][f]  =  A*iT(Struct(Append[Cons(fi,  t2)][t])). 

proj2[f2][t](tail.r) 

The  crucial  step  in  showing  that  the  term  translations  of  projections  respect  the  type 
translation,  is  showing  that 

Append[Cons(/ii,  //2 )][//]  =  Cons(/ii,  Append [/i2][/i]), 

which  follows  directly  from  the  definition  of  Append.  This  allows  us  to  argue  that  the 
inductive  cases  are  well-formed. 

One  advantage  of  explicitly  flattening  structs  in  the  target  language  is  that  we  can 
export  a  type-safe  form  of  casting  to  the  source  level.  I  call  such  a  case  a  view.  Let 
us  define  two  Mini-ML  types  T\  and  r2  to  be  similar,  ly  ~  r2,  iff  they  have  the  same 
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representations  —  that  is,  iff  |ti|  is  definitionally  equivalent  to  |r2|.  If  iq  ~  r2  in  the 
source  language,  then  it  is  possible  to  safely  view  any  source  T\  expression  as  having 
type  r2  and  vice  versa.  In  particular,  given  the  flattening  translation  above,  any  two 
source  tuple  types  that  are  equivalent  modulo  associativity  of  the  tuple  constructor, 
have  translations  that  are  definitionally  equivalent.  Thus,  if  v  \  T\  x  (r2  x  T3),  we  can 
safely  view  v  as  having  type  (iq  x  r2)  x  r3. 

Because  I  represent  equivalent  source  types  using  equivalent  target  types,  no  coercion 
needs  to  take  place  when  viewing  a  value  with  a  different,  but  similar  type.  Hence,  this 
approach  to  views,  unlike  coercion-based  approaches,  is  compatible  with  mutable  types 
(i.e. ,  arrays  and  refs)  in  the  sense  that  array[ri]  ~  array [r2]  whenever  iq  ~  r2.  This  means 
we  may  freely  intermingle  updates  with  views  of  complex  data  structures,  capturing  some 
of  the  expressiveness  of  C  casts  without  sacrificing  type-safety. 

It  is  possible  to  define  more  sophisticated  translations  that,  for  instance,  insert 
padding  to  ensure  that  each  element  of  a  struct  lies  on  a  multiple-of-eight  (i.e.,  quad- 
word)  boundary,  assuming  the  struct  is  allocated  on  an  aligned  boundary.  For  example, 
we  can  modify  Append  to  insert  padding  (a  pointer  to  an  empty  struct)  between  non-Float 
components: 

Append'fNilojf/i]  =  // 

Append'[Cons(Float,  //2 )][//]  =  Cons(Float,  Append' [/j2][/i]) 

Append'[Cons(ti,  t2)][/i]  =  Cons(H,  Cons(Struct(Niln),  Append'[/i2][/i])) 

Alternatively,  we  might  split  the  float  and  non-float  components  of  a  struct  to  avoid 
padding  altogether.  This  yields  the  following  alternative  type  translation: 

|(nxr2)|  =  Struct  ( Append  [Split[Flat  [  |ti  |]  [Split  [Flat  [|t2|]]]) 
where  Split  is  defined  as 

Split  [t]  =  Split'  [t][Nilo][Nilo] 


Split'[Niln][t/][t] 

Split' [Cons( Float,  £2 )][£/][£] 
Split'[Cons(ti,f2)][t/][t] 


Append  [Rev[i /]]  [Rev[f]] 
Split'  [t2\  [Cons(  Float,  tf)][t\ 
Split'  [t2]  [t/]  [Cons(ti ,  t)] 


Rev[f] 


=  Rev'[t][Nilo 


Rev'[Niln][t] 

Rev'[Cons(H,  t2)][t] 

This  translation  maps  the  Mini-ML  type 


t 

Rev'[t2][Cons(H,  t)] 


int  x  (float  x  (int  x  (float  x  i nt ) ) ) 
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to  the  target  type  struct{float,  float,  int,  int,  int}.  Assuming  that  such  values  are  allocated 
on  quad-word  boundaries,  the  floating-point  components  will  always  be  aligned. 

There  is,  of  course,  a  limit  to  the  transformations  that  can  be  coded,  since  definitional 
equivalence  of  constructors  must  be  decidable  so  that  in  turn,  type-checking  remains 
decidable.  Nevertheless,  the  range  of  transformations  that  Af^-like  languages  can  support 
seems  to  cover  a  wide  variety  of  the  interesting  cases. 

5.4.2  Type  Classes 

The  programming  language  Haskell  [68]  gives  the  programmer  the  ability  to  define  a 
class  of  types  with  associated  operations  called  methods.  The  canonical  example  is  the 
class  of  types  that  admit  equality  (also  known  as  equality  types  in  SML  [90]).  The  class 
of  equality  types  includes  primitive  types,  such  as  int  and  float,  that  have  a  primitive 
notion  of  equality.  Equality  types  also  include  data  structures,  such  as  tuples,  when  the 
component  types  are  equality  types.  However,  equality  types  exclude  arrow  types  since 
determining  whether  two  functions  are  extensionally  equivalent  is  generally  undecidable. 

The  peq  operation  of  Section  5.2.3  effectively  defines  an  equality  method  for  equality 
types.  However,  the  definition  includes  a  case  for  arrow  types,  because  the  type  of  peq 
is: 

Vf::fi.[T(f),T(f)]  int 

and  t  ranges  over  all  monotypes,  not  just  the  class  of  equality  types.  We  would  like  to 
restrict  peq  so  that  only  equality  types  can  be  passed  to  it. 

SAIL  accomplishes  such  a  restriction  for  its  polymorphic  equality  operation  by  having 
two  classes  of  type  variables:  Normal  type  variables  (e.g.,  a)  may  be  instantiated  with  any 
monotype,  and  equality  type  variables  (e.g.,  "a)  may  only  be  instantiated  with  equality 
types.  The  polymorphic  equality  operation  is  assigned  the  SAIL  type: 

V”a.”a  x  "a  — >■  bool 

Hence,  only  equality  types  may  instantiate  the  polymorphic  equality  primitive.  In  par¬ 
ticular,  an  SAIL  type  checker  will  reject  the  following  expression: 

fn  (x:a  — >■  j3,y:a  — >■  3)  =>  x  =  y. 

Haskell  generalizes  this  sort  of  restriction  by  qualifying  bound  type  variables  with  a 
user-defined  predicate  or  predicates  (e.g.,  is_eqty(a))3.  Another  approach,  suggested 
by  Duggan  [38],  is  to  refine  the  kind  of  the  bound  type  variable,  much  as  Freeman  and 
Pfenning  suggest  refinements  of  SAIL  datatypes  [44], 

3See  Jones  [72,  71]  for  a  general  formulation  of  qualified  types. 
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However,  it  is  possible  to  encode  type  classes  to  a  limited  degree  using  Typerec.  In  this 
section,  I  demonstrate  the  encoding  by  sketching  how  the  equality  types  of  SML  can  be 
simulated  in  \f,L .  The  basic  idea  is  to  represent  the  type  class  as  a  constructor  function 
that  maps  equality  types  to  themselves,  and  non-equality  types  to  the  distinguished 
constructor  Void.  This  Void  constructor  corresponds  to  the  void  type,  which  is  empty. 
That  is,  there  are  no  closed  values  of  type  void. 

As  an  example,  the  class  of  equality  types  is  encoded  by  the  following  constructor 
function  Eq,  which  is  defined  in  terms  of  an  auxiliary  function,  Eq’: 

Eq  ::  O  — ^  0 


Eq  [t]  =  (Eq'[f])[f] 

Eq'  ::  Q  — >  (Q  — >  Q) 

Eq'[lnt] 

Eq'[Float] 

Eq'[Unit] 

Eq'[Prod(ti,f2)] 

Eq'[Arrow(fi,  •  •  ■  ,tk,t')] 
Eq'jVoid] 


A  twCt.t 
YtwCt.t 
YtwPt.t 

At::O.Eq/[t2] (Eq'[ti]  t ) 

At::Q.Void 

Af::Q.Void 


The  Eq’  function  returns  the  identity  function  on  if  its  argument  is  an  equality  type. 
Otherwise,  Eq’  returns  the  function  that  maps  every  monotype  to  Void.  Therefore, 
(Eq[/i])[/i]  returns  /r  whenever  /i  is  an  equality  type,  and  Void  otherwise.  In  essence, 
Eq’  serves  as  an  “if-then-else”  construct  that  checks  to  see  if  fi  is  an  equality  type,  and 
if  so  returns  /j. 

Now  we  can  write  the  polymorphic  equality  function  as  follows: 


peq[lnt] 

peq[Float] 

peq[Unit] 

peq[Prod(4,f6)] 


peq[Arrow(H,  •  •  ■  ,tk,t)\ 


A[x!:int,  x2:int].  eqint(;rl5  x2) 

A[x!:float,  :r2:float].  eqf loat^rq,  :r2) 
A[.ri:unit,  x2:unit].  1 

A  [x1:T{P<rod{ta,tb)),X2-T{Prod{ta,tb))]- 

ifO  peq[fQ][7Ti  x-i,  7Ti  ;r2]  then  0  else 
peq[tb]['K2x1^2  x2] 

A  [^! : void,  x2:void].0 


and  the  term  can  be  assigned  the  following  type: 


Vf::fl[T(Eq[f]),T(Eq[f])]  ^  int 
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Consequently,  peq[/i][el5  e2\  is  well-formed  only  if  e,\  and  e2  have  type  T(/J.)  and  /i  is  an 
equality  type.  In  particular,  the  expression 

peq[Arrow([lnt],  lnt)][A.r:int.r,  Aaxint] 

is  ill-typed  because  peq  applied  to  an  Arrow  constructor  has  type  [void,  void]  — >•  Int  and 
thus  cannot  be  applied  to  the  two  functions  of  type  int  — »  int.  The  encoding  is  not  entirely 
satisfactory  because  peq  can  be  applied  to  an  Arrow  constructor.  However,  the  resulting 
expression  can  only  be  applied  to  arguments  of  type  void.  Since  there  are  no  closed  values 
of  type  void,  the  resulting  expression  can  never  be  invoked.  Thus,  an  optimizer  can  safely 
replace  the  Arrow-clause  of  peq  with  some  polymorphic  constant  (e.g.,  error). 

This  encoding  suggests  the  following  type  translation  for  SML:  Wrap  each  occurrence 
of  an  equality  type  variable  with  the  Eq  constructor  function. 

\t\  =  * 

\”t\  =  Eq  ["*] 

|Vii,  %,  •  •  • ,  "tm-r\  =  Vil,  •  •  • ,  tn,  "tu  •  •  • ,  ”tm.T(\r\) 

However,  when  instantiating  a  polymorphic  function  we  must  use  an  auxiliary  translation 
that  does  not  wrap  the  equality  variables  with  Eq: 

M  =  t 
\m  =  -t 


This  auxiliary  translation  is  needed  because  the  translation  above  does  not  commute  with 
substitution  of  equality  types  for  equality  type  variables  (i.e.,  an  extra  “Eq”  gets  wrapped 
around  each  equality  type  variable).  Hence,  the  translation  of  polymorphic  instantiation 
becomes: 


(tapp) 


A  h  fl  •  •  •  A  h  Tn  A  b  'Vi  •  •  •  Ah  ”t„ 
A;TN  : 


V  T\  ,  ,  T-ni  hi  '  “  i  An  ■  {Ti/Hi  :  Ai/Iri  H/  H,  •  •  •  ,  An/  An } 


A;  T  b 


V  '■  11  r  1 1 1 5  ‘  ‘  ‘ .  j  ||  7~n  1 1 5  II  T1 1 1 5  •  ■  ■  5  1 1  An  ||J 

It  is  easy  to  verify  that  the  type  translation  commutes  with  type  substitution, 


T 


=  |{r7’7}H|, 


and  thus  the  resulting  term  has  the  appropriate  type. 

In  this  fashion,  Typerec  can  be  used  to  encode  equality  types  and  other  type  classes, 
whereas  typerec  can  be  used  to  implement  the  methods  (i.e.,  peq)  of  the  class.  The 
information  encoded  in  the  type  class  can  be  used  by  a  compiler  to  eliminate  unneeded 
cases  within  methods. 
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5.4.3  Communication  Primitives 

Ohori  and  Kato  give  an  extension  of  ML  with  primitives  for  communication  in  a  dis¬ 
tributed,  heterogeneous  environment  [100].  Their  extension  has  two  essential  features: 
one  is  a  mechanism  for  generating  globally  unique  names  ( “handles”  or  “capabilities” ) 
that  are  used  as  proxies  for  functions  provided  by  servers.  The  other  is  a  method  for  rep¬ 
resenting  arbitrary  values  in  a  form  suitable  for  transmission  through  a  network.  Integers 
are  considered  transmissible,  as  are  pairs  of  transmissible  values,  but  functions  cannot  be 
transmitted  (due  to  the  heterogeneous  environment)  and  are  thus  represented  by  proxy. 
These  proxies  are  associated  with  their  functions  by  a  name  server  that  may  be  contacted 
through  a  primitive  addressing  scheme.  In  this  section  I  sketch  how  a  variant  of  Ohori 
and  Kato’s  representation  scheme  can  be  implemented  using  dynamic  type  dispatch. 

To  accommodate  Ohori  and  Kato’s  primitives,  I  extend  A,u,-l>ep  with  a  primitive 
constructor  Proxy  of  kind  >  O  and  a  corresponding  type  constructor  proxy(cr),  linked 
by  the  equation  T(Proxy(/i))  =  proxy(T(/i)).  The  Typerec  and  typerec  primitives  are 
extended  in  the  obvious  way  to  account  for  constructors  of  the  form  Proxy(/i). 

Next,  I  add  primitives  proxy  and  rpc  with  the  following  types: 

proxy  :  Wi,  t2::Q.(T(Tran[ii])  — >■  X(Tran[t2]))  — >  T(Tran[Arrow(ti,  t2)]) 

rpc  :  Wi, i2::fh(T(Tran[Arrow(ti, f2)]))  — ? >  T(Tran  [*i])  -►  T(Tran[t2]) 
where  Tran  is  a  constructor  coded  using  Typerec  as  follows: 

Tran  ::  Q  — >■  Q 


Tran[lnt] 

Tran[Float] 

Tran[Unit] 

Tran[Prod(fi,  t2)] 
Tran[Arrow(ti,  t2)] 
Tran[Proxy(t)] 


Int 

Float 

Unit 

Prod(Tran[ti],  Tran[t2]) 
Proxy(Arrow(ti,  i2)) 
Proxy(f) 


The  constructor  Tran[/i]  maps  /j  to  a  constructor  where  each  arrow  is  wrapped  by  a  Proxy 
constructor.  Thus,  values  of  type  T(Tran[/i])  do  not  contain  functions  and  are  therefore 
transmissible. 

The  proxy  primitive  takes  a  function  between  transmissible  values,  generates  a  new, 
globally  unique  proxy  and  tells  the  name  server  to  associate  that  proxy  with  the  function. 
For  example,  the  proxy  might  consist  of  the  machine’s  name  paired  with  the  address  of 
the  function.  Conversely,  the  rpc  operation  takes  a  proxy  of  a  function  and  a  transmis¬ 
sible  argument  value.  Then,  the  operation  contacts  the  name  sever  to  find  the  function 
corresponding  to  the  proxy.  When  the  function  is  found,  the  argument  value  is  sent  to 
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the  appropriate  machine.  Then,  the  function  associated  with  the  proxy  is  applied  to 
the  argument,  and  the  result  of  the  function  is  transmitted  back  as  the  result  of  the 
operation.  Thus,  proxy  maps  a  function  on  transmissible  representations  to  a  transmis¬ 
sible  representation  of  the  function,  whereas  rpc  maps  a  transmissible  representation  of 
a  function  to  a  function  on  transmissible  representations. 

The  goal  of  Ohori  and  Kato’s  compilation  was  to  provide  transparent  communication. 
That  is,  given  any  function  f  of  type  X(/i)  — >■  T(/i'),  their  goal  was  to  be  able  to  transmit 
a  representation  of  f  to  a  remote  site.  We  cannot  obtain  a  proxy  for  f  directly,  because 
proxies  require  functions  that  take  and  return  transmissible  representations.  Therefore, 
the  key  to  transparent  communication  is  a  function  marshal  that  coerces  f  to  take  and 
return  transmissible  values.  In  general,  we  want  marshal  to  take  any  value  and  convert 
it  to  a  transmissible  representation. 

I  can  write  marshal  using  typerec.  The  definition  requires  a  dual  function, 
unmarshal,  to  accommodate  function  arguments4. 

marshal  :  \/t::D.T(t)  — >■  X(Tran[t]) 


marshal  [I  nt] 
marshal  [Float] 
marshal[Unit] 
marshal[Prod(fi,  t2)] 

marshal  [Arrow(ii,  t2)] 


marshal  [Proxy(f)] 


A.r:int..r 

A.r:float..r 

Aaxunit.r 

A.r:T(Prod(fi,t2))- 

(marshal [ti](7Ti  x),  marshal  [f2](7r2  %)) 

A/:X(Arrow(fi,f2))- 

proxy  [ti][t2\ 

(A.r:T(Tran[fi]).marshal[t2](/  (unmarshal [fi]  r))) 
Ar:T(Proxy(t)).:r 


unmarshal  :  Vt::fkT(Tran[t])  — >•  T(t) 


unmarshal  [I  nt] 
unmarshal  [Float] 
unmarshal  [Unit] 
unmarshal  [Prod  (f  i,  f2)] 

unmarshal  [Arrow(fi,  t2)\ 

unmar  s  hal  [  P  roxy  ( t )  ] 


Ar:int.r 

A;r:float.:r 

Ar:unit.:r 

Ar:T(Tran[Prod(fi,  f2)]). 

(unmarshal [ti](7Ti  .r), unmarshal [t2](7r2  x )) 
A/:T( Proxy (Arrow(Tran[fi],  Tran[f2]))).A:r:T(fi) 
unmarshal [f2] (rpc [ti][t2]  /  (marshal[ti]  x )) 
Ar:T(Proxy(f))..r 


4Technically,  I  must  calculate  marshal  and  unmarshal  with  one  typerec  and  return  a  tuple  containing 
the  two  functions. 
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At  arrow  types,  marshal  converts  the  given  function  to  one  that  takes  and  returns  trans¬ 
missible  types,  and  then  allocates  a  new  proxy  for  the  resulting  function.  Conversely, 
unmarshal  takes  a  proxy  and  a  marshaled  argument,  performs  an  rpc  on  the  proxy,  and 
then  unmarshals  the  result. 

With  marshal  and  unmarshal,  I  can  dynamically  convert  a  value  to  and  from  its 
transmissible  representation.  In  effect,  these  terms  reify  the  stub  compilers  of  traditional 
RPC  systems  (e.g.,  the  Mach  Interface  Generator  for  Mach  RPC  [70,  123]).  Similarly, 
we  can  code  general-purpose  print  and  read  routines  within  \HTj ,  in  order  to  achieve 
the  easy  input / output  of  languages  like  Lisp  and  Scheme. 


5.5  Related  Work 

Peyton  Jones  and  Launchbury  suggested  an  approach  to  unboxed  integers  and  reals 
in  the  context  of  a  lazy  language  [75] .  However,  they  restricted  unboxed  types  from 
instantiating  type  variables.  A  similar  idea  was  recently  proposed  by  Ohori  [101]  to 
compile  polymorphic  languages  such  as  SML. 

Leroy  suggested  the  coercion  based  approach  to  allow  unrestricted  instantiation  of 
type  variables  [81],  and  later,  Poulsen  extended  his  work  to  accommodate  unboxed 
datatypes  that  do  not  “escape”  [102],  Henglein  and  Jprgensen  examined  techniques  for 
eliminating  coercions  at  compile-time.  Shao  and  Appel  [110,  108]  took  the  ideas  of  Leroy 
and  extended  them  to  the  full  Standard  ML  language.  Thiemann  extended  the  work 
of  Leroy  to  keep  some  values  unboxed  even  within  polymorphic  functions  [118].  None 
of  these  approaches  supports  unboxed  mutable  data,  or  generally  unboxed  datatypes. 
Furthermore,  they  do  not  address  type  classes,  marshaling,  or  garbage  collection. 

Of  a  broadly  similar  nature  is  the  work  on  “soft”  type  systems  [64,  7,  29,  132], 
Here,  ML-style  type  inference  or  set  constraints  are  used  to  eliminate  type-tag  checks  in 
dynamically  typed  languages  such  as  Scheme. 

Morrison,  et  al.  [97]  described  an  implementation  of  Napier  that  passed  types  at 
run  time  to  determine  the  behavior  of  polymorphic  operations.  However,  the  actual 
transformations  performed  were  not  described  and  there  was  little  or  no  analysis  of  the 
typing  properties  or  performance  of  the  resulting  code.  The  work  of  Ohori  on  compiling 
record  operations  [99]  is  similarly  based  on  a  type-passing  interpretation  and  provided 
much  of  the  inspiration  of  this  work.  Type  passing  was  also  used  by  Aditya  and  Caro  in  an 
implementation  of  Id,  so  that  instantiations  of  polymorphic  types  could  be  reconstructed 
for  debugging  purposes  [5]. 

Jones  [72,  71]  has  proposed  a  general  framework  for  passing  data  derived  from  types 
to  “qualified”  polymorphic  operations,  called  evidence  passing.  He  shows  how  evidence 
passing  can  be  used  to  implement  Haskell-style  type  classes,  generalizing  the  earlier  work 
of  Wadler  and  Blott  [122],  He  also  shows  how  Ohori-style  record  calculi  can  be  imple- 
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merited  with  evidence  passing.  Jones’s  approach  subsumes  type  passing  in  that  functions 
or  types  or  any  evidence  derived  from  qualified  types  could,  in  principle,  be  passed  to 
polymorphic  operations.  However,  qualified  types  represent  predicates  on  types,  whereas 
the  type  system  of  A  fL  supports  computations  that  transform  types.  For  example,  it  is 
not  possible  to  express  the  transmissible  representation  or  a  flattened  representation  of 
a  type  in  Jones’s  framework. 

Recently,  Duggan  and  Ophel  [38]  and  Thatte  [116]  have  independently  suggested 
semantics  for  type  classes  that  are  similar  in  spirit  to  my  proposal.  In  one  sense,  these 
proposals  do  a  better  job  of  enforcing  type  classes,  since  they  restrict  the  kinds  of  type 
variables.  However,  like  Jones’s  qualified  types,  neither  of  these  approaches  can  express 
transformations  on  types. 

Dubois,  Rouaix,  and  Weis  formulated  an  approach  to  polymorphism  dubbed  “ex- 
tensional  polymorphism”  [37]5.  The  goal  was  to  provide  a  framework  to  type  check  ad 
hoc  operators  like  polymorphic  equality.  As  with  \f*L,  their  formulation  requires  that 
some  types  be  passed  at  runtime  and  be  examined  using  what  amounts  to  a  structural 
induction  elimination  form.  Their  approach  is  fairly  general  since  it  is  not  restricted  to 
primitive  recursion  over  monotypes.  However,  type  checking  for  the  language  is  in  gen¬ 
eral  undecidable  and  type  errors  can  occur  at  run  time.  Furthermore,  like  the  approaches 
to  type  classes,  there  is  no  facility  for  transforming  types. 

Marshalling  in  languages  with  abstract  or  polymorphic  types  has  been  the  subject 
of  much  research  [85,  86,  84,  66,  27,  4,  100,  77].  The  solution  I  propose  does  not  easily 
extend  to  user-defined  abstract  types  (as  with  Herlihy  and  Liskov  [66]).  However,  none 
of  these  previous  approaches  are  able  to  express  the  relationship  between  a  value’s  type 
and  its  transmissible  representation,  whereas  I  am  able  to  express  this  relationship  as  a 
constructor  function  (i.e.,  Tran). 


5  Originally,  Harper  and  I  termed  the  type  analysis  of  \PL  “ 


tntensional  polymorphism” . 


Chapter  6 

Typed  Closure  Conversion 


In  the  previous  chapters,  I  argued  that  types  and  dynamic  type  dispatch  are  important 
for  compiling  programming  languages  like  Mini-ML.  I  showed  that  types  can  be  used  to 
direct  compilation  in  choosing  primitive  operations,  data  structure  layout,  and  calling 
conventions,  and  that  types  can  direct  a  proof  that  a  compiler  is  correct. 

If  we  are  to  use  types  at  run  time  for  dynamic  type  dispatch,  we  must  propagate 
type  information  all  the  way  through  the  lowest  levels  of  a  compiler.  This  is  one  reason 
why  type-preserving  transformations,  such  as  the  translation  from  Mini-ML  to  Af'^-Rop 
of  Chapter  5,  are  so  important. 

In  this  chapter,  I  present  a  particularly  important  stage  of  compilation  for  functional 
programming  languages  known  as  closure  conversion.  To  my  knowledge,  no  one  (besides 
Yasuhiko  Minamide,  Robert  Harper  and  myself  [92,  91])  has  presented  a  type-preserving 
closure  conversion  phase,  especially  for  type-passing  polymorphic  languages.  Therefore, 
it  is  important  to  show  that  such  a  translation  exists  if  I  am  to  claim  that  my  type-based 
implementation  approach  is  viable.  In  this  chapter,  I  show  how  to  closure  convert  A f11- 
Rep  using  abstract  closures.  Minamide,  Harper,  and  Morrisett  provide  further  details  on 
environment  representations  and  how  to  represent  closures  [92,  91]. 

I  begin  by  giving  an  overview  of  closure  conversion  and  why  it  is  an  important  part 
of  functional  language  implementation.  I  then  define  a  target  language  called  A?Mfj-Close, 
which  is  a  variant  of  Af^-Rcp  that  provides  explicit  facilities  for  constructing  closures  and 
their  environments.  Next,  I  give  a  type-directed  and  type-preserving  closure  transform 
from  Af^-Rep  to  Af^-Close  and  prove  that  it  is  correct  using  the  same  methodology  I 
used  to  compile  Mini-ML  to  Af^-Rep. 
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6.1  An  Overview  of  Closure  Conversion 


Standard  operational  models  of  programming  languages  based  on  the  A-calculus,  such 
as  the  contextual  semantics  of  Mini-ML  and  A;u,-Hep.  compute  by  substituting  terms 
for  variables  in  other  terms.  Substitution  is  expensive  because  it  requires  traversing  and 
copying  a  term  in  order  to  find  and  replace  all  occurrences  of  the  given  variable.  A  well- 
known  technique  for  mitigating  these  costs  is  to  delay  substitution  until  the  binding  of 
the  variable  is  required  during  evaluation  [80,  2,  1],  This  is  accomplished  by  pairing  an 
open  term  with  an  environment  that  provides  values  for  the  free  variables  in  the  term. 
The  open  term  may  be  thought  of  as  immutable  code  that  acts  on  the  environment. 
Since  the  code  is  immutable,  it  can  be  generated  once  and  shared  among  all  instances  of 
a  function. 

Closure  conversion  [105,  111,  33,  78,  76,  9,  124,  54]  is  a  program  transformation  that 
achieves  such  a  separation  between  code  and  data.  Functions  with  free  variables  are  re¬ 
placed  by  code  abstracting  an  extra  environment  parameter.  Free  variables  in  the  body 
of  the  function  are  replaced  by  references  to  the  environment.  The  abstracted  code  is 
“partially  applied”  to  an  explicitly  constructed  environment  providing  the  bindings  for 
these  variables.  This  “partial  application”  of  the  code  to  its  environment  is,  in  fact,  sus¬ 
pended  until  the  function  is  actually  applied  to  its  argument;  the  suspended  application 
is  called  a  “closure”,  a  data  structure  containing  pure  code  and  a  representation  of  its 
environment. 

The  main  ideas  of  closure  conversion  are  illustrated  by  considering  the  following 
monomorphic  ML  program: 


let  val  x 
val  y 
val  z 
val  f 
in 

f  100 

end 


1 

2 

3 

Aw. x  +  y  +  w 


The  function  f  contains  free  variables  x  and  y,  but  not  z.  We  may  eliminate  the  references 
to  these  variables  from  the  body  of  f  by  abstracting  an  environment  env,  and  by  replacing 
x  and  y  by  references  to  the  environment.  In  compensation,  a  suitable  environment 
containing  the  bindings  for  x  and  y  must  be  passed  to  f  before  it  is  applied.  This  leads 
to  the  following  translation: 
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let  val  x  =  1 
val  y  =  2 
val  z  =  3 

val  f  =  (Aenv.  Aw.  (wi  env)  +  (tt2  env)  +  w)  (x,y) 
in  f  100 
end 

References  to  x  and  y  in  the  body  of  f  are  replaced  by  projections  (field  selections)  7Ti 
and  7 r2  that  access  the  corresponding  component  of  the  environment.  Since  the  code  for  f 
is  closed,  it  may  be  hoisted  out  of  the  enclosing  definition  and  defined  at  the  top-level.  I 
ignore  this  “hoisting”  phase  and  instead  concentrate  on  the  process  of  closure  conversion. 

In  the  preceding  example,  the  environment  contains  bindings  only  for  x  and  y,  and  is 
thus  as  small  as  possible.  Since  the  body  of  f  could  contain  an  occurrence  of  z,  it  is  also 
sensible  to  include  z  in  the  environment,  resulting  in  the  following  code: 

let  val  x  =  1 

val  y  =  2 

val  z  =  3 

val  f  =  (Aenv.  Aw.  (7Tj  env)  +  (772  env)  +  w)  (x,y,z) 
in 

f  100 

end 

In  the  above  example  I  chose  a  “flat”  (FAM-like  [26])  representation  of  the  envi¬ 
ronment  as  a  record  with  one  field  for  each  variable.  Alternatively,  I  could  choose  a 
“linked”  (CAM-like  [33])  representation  where,  for  example,  each  binding  is  a  separate 
“frame”  attached  to  the  front  of  the  remaining  bindings.  This  idea  leads  to  the  following 
translation: 

let  val  x  =  1 

val  y  =  2 

val  z  =  3 

val  f  =  (Aenv.  Aw.  (7^(772(772  env)))  +  (iti  (tt2  env))  +  w) 

(z >  (y >  (x,  ())) 

in 

f  100 

end 

The  linked  representation  facilitates  sharing  of  environments,  but  at  the  expense  of  intro¬ 
ducing  link  traversals  proportional  to  the  nesting  depth  of  the  variable  in  the  environment. 
The  linked  representation  can  also  support  constant-time  closure  creation,  but  this  re¬ 
quires  reusing  the  current  environment  and  can  result  in  bindings  in  the  environment  for 
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variables  that  do  not  occur  free  in  the  function  (such  as  z  above),  leading  to  space  leaks 
[109], 

Closure  conversion  for  a  language  like  \fL  where  constructors  are  passed  at  run  time 
is  complicated  by  the  fact  that  we  must  account  for  free  type  variables  as  well  as  free  value 
variables  within  code.  Furthermore,  both  value  abstractions  (A-terms)  and  constructor 
abstractions  (A-terms)  induce  the  creation  of  closures. 

As  an  example,  consider  the  expression: 

A x:ti.  (x:t1}  y:t2>  z:int) 

of  type  t\  — >  (ti  x  t2  x  int)  where  t\  and  t2  are  free  type  variables  and  y  and  z  are  free 
value  variables  of  type  t2  and  int  respectively.  After  closure  conversion,  this  expression 
is  translated  to  the  partial  application 

let  val  code  = 

Atenv  :  :  Oxll. 

Avenv  :  T(tti  tenv)xint. 

Ax  :  T(ni  tenv)  .  (x,  tt |  venv,  tt2  venv) 
in 

code  (ti,t2)  (y,z) 

end 

The  code  abstracts  type  environment  (tenv)  and  value  environment  (venv)  arguments. 
The  actual  type  environment,  (ti,t2),  is  a  constructor  tuple  with  kind  OxO.  The 
actual  value  environment,  (y,z),  is  a  tuple  with  type  7  ( / v )  x int.  However,  to  keep  the 
code  closed  so  that  it  may  be  hoisted  and  shared,  all  references  to  free  type  variables  in 
the  type  of  venv  must  come  from  tenv.  Thus,  we  give  venv  the  type  T  (ni  tenv)  xint. 
Similarly,  the  code’s  argument  x  is  given  the  type  T( 7Ti  tenv).  Consequently,  the  code 
part  of  the  closure  is  a  closed  expression  of  closed  type  a,  where 

a  =  Vtenv : : Q  x  Q . 

X( 7Ti  tenv)  xint— )>T(7Ti  tenv)— KT(7Ti  tenv)xT(7T2  tenv)  xint) 

It  is  easy  to  check  that  the  entire  expression  has  type  p  — >■  (t1  x  t2  x  int),  and  thus  the 
type  of  the  original  function  is  preserved. 


6.2  The  Target  Language:  A Close 

The  target  language  of  the  closure  conversion  translation  is  called  Af^-Close.  The  syntax 
of  this  language  is  given  in  Figure  6.1.  The  constructors  of  the  language  are  similar  to 
those  of  Af^-Rep,  except  that  I  have  added  unit,  products,  and  projections  for  building 
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constructor  environments.  To  the  types,  I  have  added  a  code  type,  code(t::K,  ay,  er2), 
corresponding  to  both  value-abstraction  code  (vcode)  and  type-abstraction  code  (tcode), 
where  t  is  the  abstracted  type  environment  and  cp  is  the  abstracted  value  environment. 
If  0-2  is  an  arrow-type,  then  the  code  type  describes  code  for  a  value  abstraction,  and  if 


a2  is  a  V-type,  then  the  code  type  describes 
The  best  way  to  understand  these  new 
standard  Af^-Rep  constructs  as  follows: 

code(f::K,  a,  a') 

vcode[f::K,  x:a,  .rpoy,  •  •  • ,  £*.:a*].e 
tcode[t::re,  x:a ,  t'wn' \.e 
((<?i  ,fJ,e2)) 


code  for  a  type  abstraction, 
constructs  is  to  relate  them  informally  to 

fa  Vt::n.cr  — >  a' 

fa  At::K.\x:a.A[xi:ai,  •  •  • ,  x^oyj.e 
fa  Atv.n.Ax-.a.At'v.n'  .e 
~  (ei  [lA)  e2 


Code  terms  abstract  a  type  environment  and  a  value  environment.  In  the  case  of  value 
code,  I  also  abstract  a  set  of  k  value  arguments;  for  type  code,  I  abstract  an  additional 
type  argument.  I  have  added  a  special  closure  form  to  terms,  ((ey,  /j,  e2)),  where  e1  is  the 
code  of  the  closure,  /j  is  the  type  environment,  and  e2  is  the  value  environment.  Closure 
terms  represent  the  delayed  partial  application  of  code  to  its  environments. 

Technically,  I  need  to  provide  code  and  closure  forms  at  the  constructor  level  as 
well  as  the  term  level.  However,  doing  so  requires  redefining  an  appropriate  notion  of 
constructor  equivalence  and  reduction  in  the  presence  of  constructor  code  and  closures. 
This  in  turn  requires  reproving  properties,  such  as  strong-normalization  and  confluence 
for  reduction  of  constructors.  To  avoid  this  complexity  and  to  simplify  the  presentation, 
I  will  use  standard  A-abstractions  and  partial  applications  to  represent  code  and  closures 
at  the  constructor  level. 

The  constructor  formation  and  equivalence  rules  are  essentially  the  same  as  in  AML- 
REP  (See  Figure  5.2)  except  for  the  addition  of  rules  pertaining  to  constructor-level 
products.  I  add  formation  rules  for  products  as  follows: 


^  ^  ^  Ah  Hi  ::  Ki  A  h  p,2  ::  k2  Ah  / i  ::  Ki  X  k2 

Ah  (in,  /i2)  ::  aci  x  k2  A  h  7r,:  /i  ::  /sy 

I  add  both  (3  and  ry-like  rules  governing  equivalences  of  products  as  follows: 

A  h  /v,  ::  1 
A  h  h  =  0 


A  h  Hi  ■■  Ah  H2  k2  Ah  H  Ki  x  k2 

A  h  -Ki  (hi,  H2)  =  M  :: 


A  h  (7Ti  h,  7T2  fj)  =  H  Ki  x  k2 
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(kinds) 


k  ::=  £2  |  1  |  Kj  x  k2  |  aci  — >■  k2 


(constructors)  // 


(types)  a 

(expressions)  e 


7  |  Int  |  Float  |  Unit  |  Prod(/ii,  /j2)  |  Arrow([//i,  •  •  • ,  //*],  //)  | 

()  I  (//!,//„)  I  7Ti  /i  I  7T2  h  I  Xt-.-.K.n  I  n i/i2  I 

Typerec  /<  of  (//*;  jif  i  Tu:  Tpi  Ta) 

T(ii)  |  int  |  float  |  unit  |  (01  x  cr2)  |  [01,  •  •  • ,  07.]  — >■  <r  | 
Vty.K.a  |  code(t::/c,  <Ti,  cr2) 

x  |  *  |  /  |  ()  |  (ei,e2)  |  7T1  e  |  tt2  e  | 

vcode[i::K,  :r:a, rciic!,  •  •  • ,  Xfc:cTfc].£  |  tcode^!"/?,  x:cr,  t2::/c2].e 
((ei,yu,e2))  |  e  [ei,  •  •  • ,  en]  |  e[/r]  | 

eqint(el5e2)  |  eqf  loat(e1?  e2)  |  ifO  e1  then  e2  elsee3  | 
typerec  //  of  [t.cr]  (e*;  ef;  eu;  ep;  ea) 


Figure  6.1:  Syntax  of  Af^-Close 


I  also  add  appropriate  congruences  for  both  product  and  projection  formation  (not  shown 
here). 

The  type  formation  and  equivalence  rules  are  the  same  as  in  Af^-Rep,  with  the 
addition  of  rules  governing  code  types.  The  code  type  formation  rule  is  similar  to  the 
one  governing  V: 

A  l±l  h  a i  A  1+1  {t::n}  b  a2 

A  b  code(t::/-c,  <ti,  cr2) 

The  interesting  term  formation  rules  pertain  to  code  and  closures  and  are  as  follows: 

A  l±l  {t::n}  b  a 

A  l±l  b  <7i  •  •  •  A  t+J  {t::n}  b  oy 

A  I±1  {t::K,};  F  I±1  {x:a,  •  •  • ,  Xk'.crk}  b  e  :  a' 

A:  F  b  vcod  e[t::n,  x:a,  Xi’.on  •  •  • ,  x^.a^.e  :  cod  e(t::n,  a,  [ay,  •  •  •  ?  <jfe]  — >■  a') 

A  1+1  {b: : ac}  b  a  A  1+1  {t::n,  t'r.K1}  b  a' 

A  1+1  {t::n,  t'y.K1}-,  F  1+1  {.no-}  b  e  :  a' 

A;  F  b  tcode[f::/-c,  x:a,  t'::n'].e  :  cod e(t::n,  a,  Vb ::k  .a') 

A;  F  b  e1  :  cod e(t::n,  a,  a')  Ab/c:K  A;  T  b  e2  ::  {/ x/t)a 
A;  F  b  ((<?!,  /i,  62))  ::  {fi/t}a' 
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The  values,  contexts,  instructions,  and  rewriting  rules  for  constructors  are  standard, 
except  for  the  addition  of  products  (which  is  straightforward).  If,  at  the  constructor  level, 
I  introduced  code  and  closures  instead  of  A-abstractions,  then  we  would  consider  these 
constructs  to  be  values  (assuming  the  components  of  the  closures  are  values).  Application 
of  a  closure  to  a  constructor  value  would  proceed  by  substituting  both  the  environment 
of  the  closure  and  the  argument  for  the  abstracted  constructor  variables  in  the  code. 
These  issues  are  demonstrated  at  the  term  level. 

The  values,  contexts,  and  instructions  for  terms  are  standard  except  for  the  following 
changes:  first,  I  consider  both  vcode  and  tcode  terms  to  be  values,  as  well  as  closures 
containing  value  components: 

(values)  v  ::=  •••  |  vcode[t::K,  x:a,  •  •  • ,  xk\ak\.e  \ 

tcode[t::K,  x:a,  t'v.K1].  \  ((v,u,v')) 

I  extend  evaluation  contexts  so  that  closure  components  are  evaluated  in  a  left-to-right 
fashion  as  follows: 


(contexts)  E  ::=  •••  |  ((E,n,e))  \  ({v,u,E)) 

In  the  instructions,  I  replace  application  of  abstractions  to  values  with  applications  of 
closures  to  values.  I  also  add  an  instruction  to  evaluate  the  constructor  component  of  a 
closure.  This  yields  instructions  of  the  form 

(instructions)  I  ::=  •  •  •  |  {(v,  U[J ],  e))  |  (( v ' ,  u,  v))  [ui,  •••,$*]  | 

««',«,  . 

with  the  restriction  that  the  first  component  of  a  closure  must  be  an  appropriate  code 
term,  according  to  the  application  (see  below). 

Finally,  the  rewriting  rules  for  both  value  and  constructor  application  are  as  follows: 


£[((vcode[t::re,  x:a,  .rpcr-i,  •  •  • ,  xk:ak ].e,  u,  v))  [«i,  •  •  • ,  vk]] 
E[{u/t,  v/x,  vi/xi,  •  •  • ,  vk/xk}e\ 


E [((tcode[t::/i,  x:a,  t’wE ].e,  u,  v))  [u'j]  i — >  E[{u/t,  v/x,  u'/t'}e] 

In  each  case,  we  open  the  closure  and  extract  the  code,  type  environment,  and  value  envi¬ 
ronment.  We  then  substitute  the  environments  and  the  argument (s)  for  the  appropriate 
variables  in  the  code. 

It  is  straightforward  to  show  that  the  static  semantics  of  Af^-Close,  as  with  X^L 
and  AjMi-Rep,  is  sound  with  respect  to  the  operational  semantics  and  that  type-checking 
AjMi-Close  terms  is  decidable. 
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6.3  The  Closure  Conversion  Translation 

The  closure  conversion  translation  is  broken  into  a  constructor  translation,  a  type  trans¬ 
lation,  and  a  term  translation.  I  use  the  source  kind  and  type  judgments  to  define  these 
translations,  but  I  augment  the  judgments  with  additional  structure  to  determine  certain 
details  in  the  translation. 

Throughout  the  translation,  I  use  n-tuples  at  both  the  constructor  and  term  levels 
as  abbreviations  for  right-associated  binary  products,  terminated  with  a  unit.  For  in¬ 
stance,  I  use  («i  x  K2  x  ■  ■  •  x  k„)  to  abbreviate  the  kind  Ki  x  (ac2  x  (•  •  •  x  (nn  x  1)  •  •  •)) 
and  (hi,  yu2,  •  •  • ,  Hn)  to  abbreviate  the  constructor  (/il5  (/i2,  (•  •  •  (/in,  ())  •  •  •)))•  Corre¬ 
spondingly,  I  use  as  an  abbreviation  for  tT|  //  and  #z(/i)  as  an  abbreviation  for 

#(*  —  1 ) (7t2  h)  when  i  >  1. 

6.3.1  The  Constructor  and  Type  Translations 

I  begin  the  translation  by  considering  closure  conversion  of  Af^-Rep  constructors.  Con¬ 
structor  translation  judgments  are  of  the  form 

Aenv  >  Aarg  U  /i  . .  K  =y-  fl  , 

where  Aenv  l±l  Aarg  h  //  ::  k  is  derivable  from  the  constructor  formation  rules  of  A?Mi-Rep, 
and  n'  is  a  A^-Close  constructor.  The  axioms  and  inference  rules  that  allow  us  to  derive 
this  judgment  are  given  in  Figure  6.2. 

In  the  constructor  translation  judgment,  I  split  the  kind  assignment  into  two  pieces: 
Aenv  and  Aarg.  The  Aarg  component  contains  a  kind  assignment  for  the  type  variable 
bound  by  the  nearest  enclosing  A-abstraction  (if  any).  The  Aenv  component  contains 
a  kind  assignment  for  the  other  type  variables  in  scope.  The  translation  maps  a  type 
variable  found  in  Aarg  to  itself,  but  maps  type  variables  in  Aenv  to  a  projection  from 
a  type  environment  data  structure.  This  data  structure  is  assumed  to  be  bound  to  the 
distinguished  target  variable  tenv.  Hence,  I  assume  that  tenv  does  not  occur  in  the  domain 
of  Aarg.  The  projections  assume  that  the  order  of  bindings  in  the  kind  assignment  does 
not  change,  so  I  consider  Aenv  to  be  an  ordered  sequence,  binding  variables  to  kinds. 

The  rest  of  the  translation  is  straightforward  with  the  exception  of  A-abstractions, 
which  we  must  closure-convert.  In  this  case,  I  generate  a  piece  of  code  of  the  form 
Atenv::/ienv.  //,  which  abstracts  both  a  type  environment  (tem)  and  an  argument 

(t).  I  also  generate  an  environment,  periv  (discussed  below).  The  code  is  obtained  by 
choosing  a  new  kind  assignment,  A(,tlv,  to  replace  Aenv,  and  by  replacing  Aarg  with  {R:/vi} 
in  the  translation  of  the  body  of  the  abstraction.  Choosing  the  new  kind  assignment 
A(.nv  corresponds  to  deciding  which  variables  will  be  preserved  in  the  environment  of  the 
closure.  Therefore,  Agnv  must  be  a  subset  of  the  bindings  contained  in  Aenv  C  Aarg,  and 
Agnv  must  contain  bindings  for  all  of  the  free  type  variables  in  the  abstraction. 
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(var-arg)  Aenv;  {tr.K,}  b  t  ::  k  ^  t  (unit)  Aenv;  Aarg  b  Unit  ::  bf  =>-  Unit 


(var-env)  ■  ■  • ,  tn\\nn}\  Aarg  b  U  ::  /«*  =>  #f(fenv) 


(int)  AenV,;  Aarg  b  Int  ::  Ll  =>  Int  (float)  Aenv;  Aarg  b  Float  ::  Ll  Float 


(prod) 


Aenv;  Aarg  b  Ml  12  =4*  Aenv;  Aarg  I-  M 2  "  b2  =r-  /i 2 
Aenv;  Aarg  b  Prod  (^i ,  fl2 )  ::  b2  =4  Pfod (/i-^ , /i2 ) 


Aenv;  Aarg  b  /J,\  "  b2  =r-  Ml  •  •  •  Aenv;  Aarg  b  Mfc  U,  ^  f-l/, 
Aenv;  Aarg  P  M  ::  b2  =4  /i 

Aenv!  Aarg  P  Arrow([/ii,  •  •  • ,  //fc],  A*)  ::  b2  =>  Arrow^,  •  •  • ,  //fc],  /P) 


AenV;{t::Ki}  b /i  ::  k2  /P  A  env;  Aarg  Penv  Aenv  =r-  Menv 

Aenv!  Aarg  P  \t::n±.n  ::  ki  -t  k2  =4>  (Atenv::|A'nv|.  //)  Menv 

^  Aenv;  Aarg  P  Ml  ^  Ml  Aenv;  Aarg  P  M2  ::  ^  M 2 

Aenv;  Aarg  P  Ml  M2  ••  ^2  /i2 


(tree) 


Aenv;  Aarg  P  M  ••  b2  =4-  M  Aenv;  Aarg  P  Mi;  M/;  Mit  ••  K  ^  Mi;  M/;  M« 
Aenv;  Aarg  P  Mp  ••  b2  t  b2  t  K  t  K  t  K  => 

Aenv;  Aarg  P  Ma  ■■  b2l  t  •  •  •  t  bi^  t  b2  t  Ki  t  •  •  •  t  Ufa  t  K  >  K  =4-  Ma 
Aenv;  Aarg  P  TypereC  /I  of  ( / 4  •  M/ •  Me  •  I1/} •  (ht )  ••  f*  ^ 

Typerec  ///  of  (Mi!  M/i  M«;  Mp!  Ma) 


Aenv;  Aarg  P  C  ::  ^  Ml  '  '  '  Aenv;  Aarg  P  tn  '■  '■  Kn  =4-  Mr 

Aenv;  Aarg  P env  {tl'-Kjj  '  '  '  ;  ^n::Kn}  ^  (Ml;  '  '  '  ;  Mn) 


Figure  6.2:  Closure  Conversion  of  Constructors 
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I  construct  the  environment  of  the  closure  using  the  auxiliary  judgment 


AP 


Aarg  b env 


A' 


P 


With  the  env  rule,  I  create  an  environment  corresponding  to  Apnv  by  extracting  the 
value  corresponding  to  each  variable  in  the  domain  of  Agnv,  and  then  packing  these 
values  in  order  into  a  tuple.  If  Agnv  =  {t •  •  •  ,tn::nn},  then  the  kind  of  the  resulting 
environment  is  (aci  x  •  •  •  x  nn),  which  I  abbreviate  as  |A'nv|. 

To  translate  types,  I  use  judgments  of  the  form  Aenv;  Aarg  b  a  =$•  a'.  The  translation 
maps  AjMi-Rep  types  to  the  same  A^-Close  types,  except  that  the  injected  constructors 
are  converted  via  the  constructor  translation: 

Aenv,  Aarg  b  /i  ..  1 1  =r-  /i 
Aenvj  Aarg  b  T(n)  =»  T(/i') 

The  type  translation  of  a  polvtype  extends  the  current  argument  assignment,  Aarg,  with 
the  bound  type  variable  during  the  translation  of  the  body  of  the  polytype: 

Aenv)  Aarg  b  b  G  =r>  O 

Aenv;  Aarg  b  Vtr.K.a  =>•  \/t::K.a' 


6.3.2  The  Term  Translation 

The  term  translation  for  closure  conversion  mirrors  the  constructor  translation,  except 
that  I  must  account  for  both  free  type  variables  and  free  value  variables.  Judgments  in 
the  translation  are  of  the  form 

Aenv,  Aarg,  benv,  Targ  b  £  .  G  e 

where  Aenv  b  Aarg;  Tenv  b  Targ  b  e  :  a  is  a  Af^-Rep  term  formation  judgment  and  e'  is  a 
AfWClose  expression.  The  important  axioms  and  inference  rules  that  let  us  derive  this 
judgment  are  given  in  Figure  6.3.  The  rest  of  the  rules  simply  map  Af^-Rep  terms  to 
their  corresponding  Af^-Close  terms. 

Like  the  constructor  translation,  the  kind  assignment  and  the  type  assignment  are 
split  into  environment  and  argument  components.  The  argument  component  assigns 
kinds/types  to  the  variables  of  the  nearest  enclosing  A  or  A-expression  (if  any);  the 
environment  component  assigns  kinds/types  to  the  other  free  variables  in  scope.  As  in 
the  constructor  case,  I  translate  a  variable  occurring  in  rarg  to  itself,  whereas  I  translate 
a  variable  occurring  in  Tenv  to  a  projection  from  a  distinguished  variable,  xenv.  Again, 
the  order  of  bindings  in  renv  is  relevant  to  the  translation. 

The  abs  and  tabs  rules  translate  abstractions  to  closures  consisting  of  code,  a  con¬ 
structor  environment  (/ienv),  and  a  value  environment  (eenv).  The  code  components  are 
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(var-arg)  Aenv;  Aarg>  renv,  *  •>  %k'&k}  I-  X{  •  =^>  X{ 


(var-env)  Aenv;  Aarg;  {xi.ai  ?  *  4  *  ?  } 5  r arg  \~  Xi  •  =^'  ^^(^env) 


(tapp) 


A cnv -  Aarg  b  /i  ..  /v  /i  Aenv,  Aarg,  renv,  Tai'g  b  ^  •  Vt..AC.( J  =r>  6 

Aenv>  Aarg,  Urno  Targ  h  e  [/i]  :  {/i/t}a  =>  e'  [//] 


(abs) 


Aenv>  Aarg  I- env  Aeny  =^"  Tenv  ^envi  Aarg,  UrnA  Targ  b env  renv  =/"  6env 

/V  •  (h  \-  r'  r" 

^env’  ^  env— type  1  env  ~ ^  1  env 

A'nv;0  h  H  =>  •••  Agnv;  0  b  ak  =>  er* 

_ A'nv;  0;  r'nv;  {ggp  •  •  • ,  b  e  :  a  =»  e' _ 

Aenv)  A^g,  renv,  rarg  b  \[xi:ai,  •  •  • ,  Xk'.(7k].  e  :  [<Ti,  •  •  • ,  a^\  — >■  => 
((vcode  [tenv  ••  |  Aeny  | ,  lenv-  lUrnr  | ,  tCi  .G^ ,  •  •  •  ,  X^.G^.C  ,  /ienvi  ^env)) 


(tabs) 


Aenvj  Aarg  benv  Aeny  =^>  fJ-env  Aenv,  Affl.g,  renv,  Taj-g  benv  Fer 

A'  .  f/|  L_  p'  _v  y" 

^env>  ^  env— type  L  env  ~ ^  1  env 

_ A'nv;  r'nv;  0  h  e  :  a  =>  e _ 

Aenv  5  Aarg,  ^env?  f^arg  \~  A t..K,.€-  .  =/* 

((tcode[tenv  ••  |  Aenv | ,  ^env* |renv | ?  , /ienv  5  ^env  )  } 


Figure  6.3:  Closure  Conversion  of  Terms 
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constructed  by  choosing  new  kind  and  type  assignments  to  cover  the  free  variables  of  the 
abstraction. 

The  environment  components  are  constructed  using  the  auxiliary  judgments 


A  .A  U  A7 
^env;  ^arg  1  ^env 


Ten 


and 


A  -a  •  r  v  I-  r' 

^env;  ^arg;  L  env)  L  arg  1  env  L  env 


The  former  judgment  is  obtained  via  the  env  constructor  translation  rule,  whereas  the 
latter  judgment  is  obtained  via  the  following  term  translation  rule: 


AP 


A  •  r 

^arg5  L  ' 


env;  farg  b  X\.G\ 


=>  <T 


(env) 


AP 


Aarg,  Tenv,  I  arg  b  -A  •  A/ 


Aarg;  renv,  Targ  b  {iCl-CTl,  •  ,  Xn.Gn^  =/-  (C|,  ‘  ,  G-nj 


This  rule  translates  a  new  type  assignment,  Tgnv,  by  extracting  the  values  corresponding 
to  each  variable  in  the  domain  of  the  assignment  and  placing  the  resulting  values  in  a 
tuple.  To  obtain  the  type  of  the  resulting  environment  data  structure,  I  first  translate 
all  of  the  types  in  the  range  of  Tgnv: 


(env- type) 


Aenv,  A arg  b  G\ 


0\ 


Aenv,  Aarg  b  @ n  ^ 


Aenv;  Aarg  benv— type  {.T  |  •  (V | ,  ,  %n-&n}  ^  {.r i . G y  ,  ‘  , 


This  process  results  in  a  Close  type  assignment  r"nv  =  •  •  • ,  xn:G'n}.  The  tuple 

environment  data  structure  has  the  type  (g[  x  •  •  •  x  cb ),  which  I  abbreviate  as  |Tgnv|. 


6.4  Correctness  of  the  Translation 

To  prove  the  correctness  of  the  closure  conversion  translation,  I  will  establish  suitable 
relations  between  source  and  target  constructors  and  terms,  and  then  show  that  a  source 
construct  is  always  related  to  its  translation.  I  first  examine  the  correctness  of  constructor 
translation  and,  then  consider  term  translation. 


6.4.1  Correctness  of  the  Constructor  Translation 

It  is  clear  that  if  Aenv;  Aarg  b  fi  ::  n  =4>  //,  then  Aenv  l±l  Aarg  b  /i  ::  n.  It  is  also  fairly 
easy  to  show  that  we  can  always  construct  some  translation  of  a  well-formed  constructor. 
I  only  need  to  show  that  we  can  always  delay  any  reordering  or  strengthening  of  the 
kind  assignment  until  we  reach  a  use  of  an  abs  rule.  Finally,  it  is  also  easy  to  show  via 
induction  on  the  translation  that  constructor  closure  conversion  preserves  kinds  directly. 
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Lemma  6.4.1  (Kind  Correctness)  If  Aenv;  Aarg  b  /j  ::  k  =>-  //' ,  then  {tenv::|^env|}  b 

Aarg  b  n'  ::  k. 

I  want  to  show  that  a  constructor  and  its  translation  are  suitably  equivalent.  I  begin 
by  establishing  a  set  of  kind-indexed  simulation  relations  relating  closed  source  and  target 
constructors.  At  base-kind,  the  relation  is  given  as  follows: 

0  b  n  =  u  ::  0  0  b  fj,'  =  u  ::  Q, 

T  ~Q  A 

Two  closed  constructors  of  kind  Q  are  related  if  they  are  definitionallv  equivalent  to  the 
same  constructor  value.  That  is,  to  determine  whether  a  source  and  target  constructor 
are  related,  we  simply  normalize  the  constructors  and  then  syntactically  compare  them. 
(Recall  that  constructors  of  the  source  language  are  a  subset  of  the  constructors  in  the 
target  language  and  the  two  languages  coincide  at  the  base  kind  fh)  I  logically  extend 
the  base  relation  to  arrow  kinds: 

Rl  lA_  implies  p  /q  ^K2  //  A 

h  ~K1— YK2  T 

Let  8  and  S'  range  over  substitutions  of  type  variables  for  closed,  source  and  target  con¬ 
structors  respectively.  I  extend  the  relation  to  substitutions  indexed  by  kind  assignments: 

Dom(S)  =  {A,  •  •  • ,  tn}  VI  <  i  <  n.Sjtj)  faK.  #z(p) 

S  —,tn::nn}  {fenv=b} 

Note  that  the  distinguished  type  variable  tenv  is  used  in  the  target  substitution.  Finally, 
I  relate  pairs  of  substitutions  5env;  SaTg  and  S'env;  S'  as  follows: 

a  ~  A  A' 

°env  ~Aenv  °env 

Dom(5avg)  =  Dom(  Aarg)  =  Dom(5'aig) 

Vf  €  Dom( Aarg).()arg(f)  ~Aarg(t)  Arg  A) 

Anv>  Arg  ~Aenv;Aarg  Anv  >  Arg 

The  pairs  of  substitutions  are  related  iff  8emr  and  d'env  are  related  under  AenV5  and  for  any 
argument  variable  t,  Sarg(t )  and  S'  (t )  are  related  at  A arg(£). 

With  these  definitions  in  place,  I  can  state  and  prove  the  correctness  of  the  constructor 
translation.  The  first  step  is  to  show  that  constructing  new  environments  from  related 
constructors  yields  related  environments. 

Lemma  6.4.2  If  Senv,Sarg  ~Aenv;Aarg  Anv >  Arg. >  Aarg  benv  Aenv  (/ii,  •  •  • ,  p,n), 
where  A'env  =  {t±::ni,  ■  ■  -,tn::Kn},  then  (A=Anv  b  Arg(R),  •  •  •  rtn=Sem  b  Sarg(tn)}  «A'env 
{Anv  Anv  b  Arg (/W  ‘  ‘  5  /hi)}- 
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Proof:  I  must  show  that  (5env  l±l  5arg  (t, )  #*(<*env  W  ^arg^l,  ‘  ‘  Pn))  for  l  <  1  <  U. 

Hence,  it  suffices  to  show  5eilv.  1+)  Sarg(tj)  faK.  S'env  l±)  S'  (nf).  By  an  examination  of  the  env 
rule,  Aenv;  Aarg  h  tiWKiiM.  There  are  two  cases  to  consider,  depending  on  the  rule  used  to 
produce  /it :  either  f,  is  translated  under  the  var-arg  rule  or  else  i,  is  translated  under  the 
var-env  rule.  In  the  former  case,  m  =  L.  By  assumption,  5env;Aarg  «Aenv;Aarg  S'em]S'aYg, 
thus  Savg(ti)  faK.  S'  (ti).  In  the  latter  case,  we  have  m  =  ffj(te nv),  t,;  =  and  Aenv  is 
of  the  form  •  •  • ,  £'■::«*,  •  •  • ,  By  assumption,  <5env  «Aenv  Sarg.  Therefore,  by 

the  definition  of  the  relation,  for  all  1  <  A:  <  rn.  Senvt'k  ~Ka,  S'envffk(tenv).  In  particular, 
since  f'-  =  Senv(ti)  ~K;  ffj{S'env(tenv)).  Consequently,  in  either  case  we  know  that 

$env  b  ^arg(C)  ~K;  ^env  b  ^arg  ( Pi )  •  ^ 

Next,  by  induction  on  the  derivation  of  a  translation,  I  show  that  a  constructor  and  its 
translation  are  related  when  we  apply  related  substitutions.  In  particular,  the  translation 
of  a  constructor  of  base  kind  yields  a  constructor  that  is  definitionally  equivalent  to  that 
constructor. 

Theorem  6.4.3  (Constructor  Correctness)  If  Aenv;  Aarg  b  //  ::  k  =>-  //  and 

^envj  ^arg  ~Aenv;Aarg  ^env>  ^arg>  then  6env  l±)  hai.g(/i)  $env  l±l  harg(/Y,  ). 

Proof:  By  induction  on  the  derivation  of  Aenv;Aarg  b  /i  ::  k  =>•  //.  The  interesting 

cases,  var-arg,  var-env,  fn,  and  tree  are  given  below. 

var-arg:  Aenv;  {t::n}  \-  t  ::  k  =$■  t.  By  assumption,  Sarg(t)  S'&  (t). 

var-env:  ■  ■  ■  ,tn::nn}  b  p  ::  /c4-  =>■  #i(tenv).  By  assumption,  Senv(tj.) 

M{S'e  nv(ti)))- 

fn:  Aenv,  Aarg  b  A t..K\.  n  ..  i%i  y  (Atenv..|Aenv|.  Xt..K\.  /i  )/v.env.  Let  ~K1  Hi- 

I  must  show  {Xt::K,1.Senv  b  S^p))  p  1  ?aK2  {5'env  b  S'arg{Xtenv::\A'env\.  Xt::^.  /j,')  /ienv)  //'. 
Since  the  code  of  the  closure  is  closed,  it  suffices  to  show  (5env  b  5arg  b  {t=Hi })(//) 
({tenv=S'env  b  S'arg ( /ie„v ),t=/i'i }(//))•  Since  A'nv;  {t::ACi}  b  /i  ::  k2,  I  can  drop  the  bindings 
of  variables  in  S^v  b  Sarg  that  do  not  occur  in  Agnv.  Let  8”m  =  [t=Senv  b  Sarg(t)  \  t  G 
Dom{ Aenv)}.  I  must  now  show  (<5"nvb{t=/i l})(/i)  ftiK2  ({4nv=<Lv W^arg(^eny) ,  } (//')) . 

By  the  inductive  hypothesis,  this  holds  if  I  can  show  that  d"nv  b  {t=n i)  ~Aenv;{t::Ki} 
{4nv=^env  b  ^rg(^env)}  b  By  assumption,  // 1  ?yK1  //'  so  I  only  need  to  show  that 

^env  ~Aenv  {fenv  ^env  b  ^arg  {Penv ) }  • 

But,  this  holds  directly  from  lemma  6.4.2. 
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tree:  We  have 

Aenv;  Aarg  h  Typerec  //  of  (/x,;  /x/; /.x„:  nv\  Ha )  "  k  =>  Typerec  A  of  (A;  y'f,  Ad n'p]  Aj- 

By  assumption,  Anv  l±l  <5arg ( /x )  d'env  l±l  d'arg(ju').  Therefore,  these  two  constructors  have 
the  same  normal  forms  when  their  respective  substitutions  are  applied.  Let  //0  be  this 
normal  form.  I  argue  by  induction  on  the  structure  of  /i0  that 

AnvW^arg (Typerec  /x  of  (/n;  /x/;  /xu;  /x„;  /xa))  (Typerec  //  of  (A;  A/!  Ad  n'pl  Aa))- 

If  /Jjq  is  Int,  then 

Anv  ttl  Arg  (Typerec  /x  of  (/x*;  ///;  ly  /xp;  /xa))  =  Anv  i+J  Arg(/A 

and 

Anv  w  Tirg(Typerec  A  of  (A;  A/!  Ad  n'p,  AJ)  =  Anv  i±i  Arg(A)- 

By  the  outer  inductive  hypothesis,  these  two  constructors  are  related  at  k.  Similar 
reasoning  shows  that  the  result  holds  for  /x0  =  Float  and  /j,0  =  Unit. 

If  hq  is  Prod(//i,  /x2),  then 

Anv  ^  Arg  (Typerec  //  of  (/x*;  /x/;  /xp;  La))  =  Anv  W  Arg  (/A  L\  U>  La  Lb) 

where  /x0  =  Typerec  // 1  of  (/x*;  /x/;  /x„;  /xp;  /xa)  and  /x6  =  Typerec  /x2  of  (/x*;  /x/;  /x„;  /xp;  /xa). 
Likewise, 


^env  W  Cg(Typerec  //  of  (A;  n'f\  n'u ;  /x',:  /x'J) 


Anv  ^  ArgK/^1^2  Aa  A) 


where  /x'a  =  Typerec  /.x,  of  (A;  A/!  Ad  A,;  AJ  and  A  =  Typerec  /x2  of  (A;  A/!  Ad  Lp,  La)- 
By  the  inner  induction  hypothesis,  Anv  W  Arg(^a)  ~k  Anv  w  Arg(AJ  and  Anv  W  KgiLb) 

Anv  w  Arg(Afc)-  By  the  outer  induction  hypothesis,  Anv  W  Arg  (M  Anv  ^  Arg(A>)  where 
ac7  =  r2  — >■  r2  — >■  ac  — >■  ac  — »  k.  Hence,  by  the  definition  of  the  relations  at  arrow  types,  we 
know  that 

Anv  W  Arg ( Lp  Ll  L‘2  La  Lb)  ~k  Anv  ^  Arg (Lp  Ll  L2  La  Lb)- 
Similar  reasoning  shows  that  the  result  holds  for  /x0  =  Arrow([/xi,  •  •  • ,  /x*],  /x). 

□ 


6.4.2  Type  Correctness  of  the  Term  Translation 

The  next  step  in  proving  the  correctness  of  closure  conversion  is  to  show  that  each 
translated  term  has  the  translated  type.  I  begin  by  showing  that  the  type  translation 
commutes  with  substitution. 
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Lemma  6.4.4  If  Aenv;Aarg  l±)  b  cr  =>  cb  and  Aenv;  Aarg  b  //  ::  k  =>-  //,  t/ien 

^env?  Aarg  I-  {/U./t}cr  =>-  {/i'/tjcr'. 

Proof:  By  induction  on  the  derivation  of  Aenv;  Aarg  l+l  {t::n}  b  a  =>  o' .  The  base  case, 

T(n)  relies  upon  the  correctness  of  the  constructor  translation.  □ 

The  following  lemma  is  critical  for  showing  that  closures  are  well-formed.  Roughly 
speaking,  it  shows  that  a  type  obtained  from  the  “current”  constructor  context 
(Aenv;Aarg)  is  equivalent  to  the  type  obtained  from  the  closure’s  context,  as  long  as 
we  substitute  the  closure’s  environment  for  the  abstracted  environment  variable. 


Lemma  6.4.5  If  Aenv,  Aarg  b  o  =t>  o\  and  Aenv,  Aarg  benv  Aenv  =r*  /a,  then  Aenv,  0  b  o  =r* 
02  and  {tenv- •  |  Aenv  I }  td  Aarg  b  Cq  =  {/i/tpnv}T2- 


Theorem  6.4.6  (Type  Correctness)  If  Aenv;  Aarg;  Fenv;  Farg 


Aenv)  A, 

0  =>  o' ,  then  {t, 


benv— type 


=>.  r7 

f  env? 

.|)  W  Aarg;  {x 


Aenv)  Aarg 


benv— type 


:  i'm  }  W  Farg  h  e'  : 


arg 


o 


b 

n 


arg? 


:  o  => 
and  AP 


e' ,  then 
a  A  arg  1 


Proof:  By  induction  on  the  derivation  of  Aenv;  Aarg;  renv;  rarg  b  e  :  a  =>  e7.  The 

most  interesting  cases  are  the  translations  of  variables  and  A-abstractions  (shown  below). 
The  other  cases  follow  in  a  straightforward  fashion.  In  particular,  the  treatment  of  A- 
abstractions  almost  directly  follows  the  treatment  of  A-abstractions. 


var-arg:  We  have  Aenv;  Aarg;  renv;  {:tq:<Ti,  •  •  • ,  xn\on}  b  X{  :  cq  =>■  e'.  By  the  type- 
env  translation  rule,  F7arg  =  {aqic^,  •  •  ■ ,  xn:o'n}  where  Aenv;  Aarg  b  <7*  =>-  o\.  Thus, 
{4nv-  l^env  |  })  Aarg,  {-Anv  •  |  Fenv  |  } ,  Farg  b  .  CTj. 

var-env:  We  have  AeoS|  Aarg;  {xi\au^  * ,  xn:an};  rarg  b  Xj.  :  cr*  =>■  #i(xenv).  By  the 
type-env  translation  rule,  r7nv  =  {:r,\  \o\ ,  •  •  •  ixn:o'n}  where  Aenvj  Aarg  b  a,;  =4>  of  Thus, 
|renv|  —  (<j-l  x  •••  x  on).  Hence,  -(tenv..|Aenv|},  Aarg,  {^env|renv|})  rarg  b  ffi(xenv)  .  cq. 


abs:  To  simplify  the  proof,  I  only  show  the  case  for  1-argument  functions.  We  have 


Aenv)  Aarg,  renv,  Larg  b  \x.Oa.  e  .  Oa 


((^C5  A^env  ?  t-env  )  ) 


where  ec  is  vcode[tenv::|Agnv|, xenv:|renv|, x:o'f].e'.  By  the  inductive  hypothesis,  we  know 
that 

{W:|AeJ};  {W,yil'",v  }  b  {*:<}  b  e'  :  a", 

where  A'nv;  0  b  T7nv  =>  T"nv,  Agnv;  0  b  oa .  =>  a",  and  Agnv;  0  b  ob  =>  a".  From  this  and 
the  typing  rule  for  vcode,  we  can  conclude  that 

0;  0  b  ec  :  code(ienv::  | A'nv|,  |r"nv|,  a"  ->■  of). 
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From  kind-preservation  of  the  constructor  translation,  we  know  that  {fenv::|Aenv|}  l±J 
Aarg  h  /ienv  ::  |A'nv|.  Thus,  the  code  and  the  type  environment  agree  on  kinds.  I  only 
need  to  show  that  the  code  and  the  value  environment  agree  on  types. 

Suppose  Fgnv  =  •  •  • ,  xn:crn}.  Then  eenv  =  (ei,  •  •  • ,  en)  where 

Aenv .  Aarg,  renv,  rarg  I  :i'f  .  <jj  f'i 

for  1  <  i  <  n.  By  the  induction  hypothesis, 

{tenv::|Aenv|}  F  Aarg;  {®env: | r7 1 }  1+1  r'arg  h  ei  :  o\ 

where  Aenv,  Aarg  F  Fenv  F  ,  Aenv,  Aarg  F  Farg  rarg,  and  Aenv,  Aarg  F  Oi  o j.  From 

lemma  6.4.5,  we  know  that  {fenv"|A'env|}  h  o\  =  {/ienv/fenv}r"nv(:r,;).  Therefore, 

{tenv  |  Aenv  |  }  F  (°l  X  •••  X  ( Jn )  =  {/^env/^env}|renv|. 

Thus,  from  the  formation  rule  for  closures,  we  can  conclude  that 

{fenv  •  |  ^env  |  }  F  Aarg,  { X |  F  |}  1±)  Farg  F  ({Cc, /ienvTenv})  •  {E  env/^env  }(°a  ab )• 

Suppose  Aenv;  Aarg  F  oa  =>■  cF  and  Aenv;  Aarg  F  aj,  =>  a'b.  Then  by  the  type  transla¬ 
tion, 

Aenv>  Aarg  F  Oa  >  Ufj  =^>  <7a  )•  (7^. 

By  lemma  6.4.5,  we  can  conclude  that 

{fenv  |  Aenv  |  }  FI  Aarg  F  (Ja  >  ( 7 ^  =  {/Jenv/fenv}  (&a  ^  )  • 

Hence, 

{fenv  |  Aenv  |  }  F  Aarg,  {x.|F  |  {  Ft)  Farg  F  ( (fici  l^env j  ^env) )  •  ® a  ^  T b • 

□ 

6.4.3  Correctness  of  the  Term  Translation 

Correctness  of  the  term  translation  follows  a  similar  pattern  to  that  of  the  constructor 
translation.  I  begin  by  establishing  a  set  of  relations  for  closed  term  values,  indexed  by 
closed  source  types. 

0;  0  F  e  :  a  0  F  a  =4>  o’  0;  0  F  e'  :  a' 
e  |1  v  iff  e  v1  and  v  K,a  v' 
e  e' 

1  f  ~float  f  0  ~unit  0 


1  — int 
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7Ti  V  ~CT1  7T i  V  7 T2  V  ~a2  7Ti  v' 

^  X  (72  ^ 

i’i  ~gl  ~aA.  4  implies  u  [tg,  •  •  • ,  ufc]  ^  t/  [rq,  •  •  • ,  4] 

h  //  implies  v  [/i]  ~{/J/t}cr  n'  [//] 

«  ~Vi::K.<7  t)' 

Two  expressions  e  and  e'  are  related  at  source  type  a  iff  e  has  type  a,  e'  has  a  type  o' 
obtained  by  translating  a,  and  e  evaluates  to  a  value  iff  e'  evaluates  to  a  related  value 
at  o.  Values  at  base  type  are  related  iff  they  are  syntactically  equal.  Values  at  product 
type  are  related  if  projecting  their  components  yields  related  computations.  Values  at 
arrow  types  are  related  when  they  yield  related  computations,  given  related  arguments. 
Finally,  two  polymorphic  values  are  related  if  they  yield  related  computations  given 
related  constructors. 

For  open  expressions,  I  extend  the  relations  to  substitutions  7  and  7',  mapping  vari¬ 
ables  to  values,  where  the  relations  are  indexed  by  a  source  type  assignment  F  as  follows: 

_ l’l  V[  •  •  •  Vn  V'n _ 

{'X I  ^1 1  ‘  ‘  ‘  5  Xn  W}  ~{xi :<ti ,---,xn:an}  {^eiiv  (tff 5  ‘  ‘  ‘  5  R, ) } 

I  relate  pairs  of  substitutions,  7env;  7arg  and  7-'nv;  7arg  as  follows: 

7env  ~renv  Tenv 

Dom(rarg)  =  Doming)  =  Dom( 7arg)  Vx  €  T>om(rarg).7arg(.r)  ^rarg(x)  7arg(x) 

y<‘nv  ■  /arg  ^FenviTarg  Tenv’  arg 

The  following  lemma  shows  that  the  translation  of  variables  is  correct  ,  and  thus  so  is 
the  translation  of  environments. 

Lemma  6.4.7  Let  henv)  ^arg  ~Aenv;Aarg  4m  •  4rg  and  7env>  7arg  ~<5envl+)<5arg(renv;rarg)  Tenv >  Targ- 

1-  If  nV)  ^arg;  renv,  Farg  h  X  .  O  =7  e,  then  7env  ^  7arg4)  ^<5envW<5arg(o-)  Tenv  ^  7arg(e)- 

I"  If  nvj  ^arg)  renv,  Farg  F env  I^nv  =7  eenv,  then  7env  ^  7arg(^env)  'll"  ^env  fOT  SOnfie  C'env 

and,  {:r=7env  t+J  7arg(*)  |  X  E  Dom(r'env)}  «  5enva5arg(r'env)  {ffenv  ^!env  }  • 

With  this  lemma  in  hand,  I  can  establish  the  correctness  of  the  translation  by  showing 
that  a  AjMi-Rep  expression  is  always  related  to  its  Af^-Close  translation,  given  appropri¬ 
ately  related  substitutions. 
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Theorem  6.4.8  (Correctness)  Let  Senv;Sa rg  ~Aenv;Aai.g  ^L-g; 

and  let  Tenv;  Targ  ~5envttl(5arg(renv;rarg)  Tenv >  7arg-  U  ^env;  ^arg;  renv,  Targ  e  (7  =>  e  ,  i/ieu, 
^env  W  ^'arg ( Tenv  W  7arg(e))  ^Senva5aig((J)  ^env  ^  ^arg(7env  ^  7arg(e  ))• 

Proof:  By  induction  on  the  derivation  of  Aenv;  Aarg;  renv;  rarg  h  e  :  a  =>-  e'  (see 

Figure  6.3).  The  var-arg  and  var-env  cases  follow  directly  from  lemma  6.4.7.  The 
int,  float,  and  unit  rules  follow  trivially.  The  elimination  rules  proj,  app,  and  tapp 
follow  directly  from  the  inductive  hypotheses  as  well  as  the  definitions  of  the  relations. 
The  typerec  rule  follows  directly  from  constructor  translation  correctness,  the  inductive 
hypotheses,  and  the  definition  of  the  relations.  Arguments  for  the  abs  and  tabs  rules 
follow. 

abs:  Let  V1  v'u  V.  D  ~Senv«<W^)  V and  let  Tenv  =  {^Tenv^Targ^)  |  X  E 

Dom{T')}.  By  lemma  6.4.7,  7'nv  l±i  7arg(eenv)  ^  veav  for  some  venv  and  j”nv  ^«5enva<5arg(r'env) 

{ -Tenv — r>env  }  ■ 

Let  5"nv  =  {t=Aenv  l±J  Aarg(t)  |  t  E  Dom{ A')}  and  let  /4nv  =  S'em\SS'aTg(nenv).  By  lemma 
6.4.2,  we  know  that  <)"nv  ~a;,tiv  {tem=^env}.  Hence,  <5"1V;  0  ^A'env;0  {*env=/4iv};  0-  B>7  the 
induction  hypothesis  and  type  preservation,  we  can  conclude  that 

Cv(7env  W  {^1=^1,  •  •  •  ,  Xk=Vk}{e))  ^Senv^aig(a) 

{Anv — /Anv}({"^env — ^env;  X\ — t'-^,  7  ^  ))• 

Thus, 

Anv  ttl  harg(7env  ttl  Targ (-^ \p  1  -®1 j  ‘  5  %k-&k\-  o))  ~5envW<5aK|f  [01 ur) 

^env  ^  ^arg ( Tenv  ^  Targ(((ec;  Tenv-i  eenv )))) 

where  ec  is  vc ode [tenv:: \ A'env |, xenv:|r"nv|, x^,  ■  ■  • , xk:o'k\.e'. 

tabs:  Let  /i  //  and  let  YJnv  =  {x=%nv  l±l  7arg(^)  |  x  E  Do7n(T')}.  By  lemma  6.4.7, 

Tenv  W  Targ(eenv)  U  Anv  for  some  Uenv  and  7"nv  ^,5enva5arg(r'env)  {Zenv^env}- 

Let  Cnv  =  0=4nv  W^arg(f)  |  t  E  Dom(A')}  and  let  jj'env  =  S'env  l±l  S'ai.g(/j,env).  By  lemma 
6.4.2,  we  know  that  S"nv  ~A'env  {teQV=n'eQV}.  Hence,  by  the  assumption  regarding  p  and 
A*',  d'envA^u}  ~A'env;{t::K}  {tenv=ii'env};  {t=/i'} .  By  the  induction  hypothesis,  and  type 
preservation,  we  can  conclude  that 

^env  ^  {f  /i}(7'env(e))  ~TnvW<Trga{t=/i}(<T)  Oenv  /Anv’ T4  }({*^env  ^env}(e  )). 

Thus, 

^env  W  7arg(7env  ^  Tar g  ( At . . /v.e) )  ~5envi±)5arg(Vt::K.<r)  ^env  ^  ^arg(TenV  ^  Targ(((®c?  Te nvi  eenv)))) 

where  ec  is  tcode[tenv::|Agnv|;}2:env:|r"nv|,  t::n\.e'.  □ 
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6.5  Related  Work 

Closure  conversion  is  discussed  in  descriptions  of  various  functional  language  compil¬ 
ers  [111,  78,  11,  9,  109].  It  is  closely  related  to  A-lifting  [69]  in  that  it  eliminates  free 
variables  in  the  bodies  of  A-abstractions.  However,  closure  conversion  differs  by  making 
the  representation  of  the  environment  explicit  as  a  data  structure.  Making  the  envi¬ 
ronment  explicit  is  important  because  it  exposes  environment  construction  and  variable 
lookup  to  an  optimizer.  Furthermore,  Shao  and  Appel  show  that  not  all  environment 
representations  are  “safe  for  space”  [109],  and  thus  choosing  a  good  environment  repre¬ 
sentation  is  an  important  part  of  compilation. 

Wand  and  Steckler  [124]  have  considered  two  optimizations  of  the  basic  closure  conver¬ 
sion  strategy  —  selective  and  lightweight  closure  conversion  —  and  provide  a  correctness 
proof  for  each  of  these  in  an  untyped  setting.  Hannan  [54]  recasts  Wand’s  work  into  a 
typed  setting,  and  provides  correctness  proofs  for  Wand’s  optimizations.  As  with  my 
translation,  Hannan’s  translation  is  formulated  as  a  deductive  system.  However,  Han¬ 
nan  does  not  consider  the  important  issue  of  environment  representation  (preferring  an 
abstract  account),  nor  does  he  consider  the  typing  properties  of  the  closure-converted 
code. 

Minamide,  Morrisett,  and  Harper  give  a  comprehensive  treatment  of  type-directed 
closure  conversion  for  the  simply-typed  A-calculus  and  a  predicative,  type-passing  poly¬ 
morphic  A-calculus  [92,  91].  This  chapter  extends  the  initial  treatment  by  showing  how 
to  closure  convert  a  language  like  \HL  with  higher-kinds  (i.e. ,  functions  at  both  the 
constructor  and  term  levels). 


Chapter  7 

Typ  es  and  Garbage  Collection 


In  the  previous  chapters,  I  argued  that  one  should  use  types  at  compile  time  to  direct 
the  translation  of  a  high-level  language  to  a  low-level  language.  In  this  chapter,  I  will 
show  that  types  can  be  used  at  run  time  to  implement  a  key  facility,  namely  automatic 
storage  reclamation  or  garbage  collection.  As  in  the  previous  chapters,  types  will  guide 
us  in  the  process  of  garbage  collection  as  well  as  a  proof  of  correctness. 

In  most  accounts  of  language  implementation,  garbage  collection  is  either  ignored  or 
at  best  discussed  without  regard  to  the  rest  of  the  implementation.  Most  descriptions  of 
garbage  collectors  are  extremely  low-level  and  concentrate  on  manipulating  “mark  bits” , 
“forwarding  pointers”,  “tags”,  “reference  counts”,  and  the  like.  This  focus  on  the  low- 
level  details  makes  it  extremely  difficult  to  determine  what  effect  a  garbage  collector  has 
on  a  program’s  evaluation.  As  a  result,  there  are  very  few  proofs  that  a  garbage  collector 
does  not  interfere  with  evaluation  and  only  collects  true  garbage. 

The  primary  culprit  is  that  traditional  models  of  evaluation  based  on  the  A-calculus, 
such  as  the  contextual  semantics  of  Mini-ML  and  X^L,  use  substitution  as  the  mechanism 
of  computation.  Unfortunately,  substitution  hides  all  memory  management  issues:  during 
evaluation  we  simply  a-convert  terms  so  that  we  can  always  find  an  unused  variable. 
Since  a-conversion  is  defined  in  terms  of  substitution,  it  is  substitution  that  implicitly 
“allocates”  fresh  variable  names  for  us.  Furthermore,  when  we  substitute  a  value  for 
a  variable,  if  there  are  no  occurrences  of  that  variable,  the  value  disappears.  Thus, 
substitution  also  takes  care  of  “collecting”  unneeded  terms. 

In  this  chapter,  I  develop  an  alternative  style  of  semantics  where  allocation  is  explicit. 
The  basic  idea  is  to  represent  a  program’s  memory  or  heap  as  a  global  set  of  syntactic 
declarations.  The  evaluation  rules  allocate  large  objects  in  the  global  heap  and  automat¬ 
ically  dereference  pointers  to  such  objects  when  needed.  Since  the  heap  is  explicit,  the 
process  of  garbage  collection  is  made  explicit  as  any  relation  that  removes  portions  of  a 
program’s  heap  without  affecting  the  program’s  evaluation. 

I  specify  a  particular  garbage  collection  strategy  which  characterizes  the  family  of 
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tracing  garbage  collectors  including  mark/sweep  and  copying  collectors.  By  employing 
standard  syntactic  techniques,  I  prove  that  the  specification  is  correct  with  respect  to 
the  definition  of  garbage. 

Next,  I  develop  an  algorithm  that  implements  the  tracing  garbage  collection  strategy. 
Standard  tracing  collectors  use  tags  on  values  in  the  heap  to  determine  their  shape  — 
the  size  of  the  object  and  any  pointers  contained  in  the  object.  Instead  of  using  tags,  I 
show  that  for  monomorphic  languages,  if  enough  type  information  is  recorded  on  terms 
at  compile  time,  types  can  be  used  to  determine  the  shape  of  objects  in  the  heap.  Con¬ 
sequently,  no  tags  are  required  on  the  values  in  the  heap  to  support  garbage  collection. 
This  approach  to  tag- free  garbage  collection  is  not  new  [23],  but  my  formulation  is  at  a 
sufficiently  high  level  that  it  is  easy  to  prove  its  correctness. 

I  then  show  how  to  extend  the  tag-free  collection  algorithm  to  accommodate  genera¬ 
tional  garbage  collection.  Generational  collection  is  an  important  technique  that  collects 
most  of  the  garbage  in  a  program  but  examines  a  smaller  set  of  objects  than  the  standard 
tracing  collection  algorithm.  Thus,  generational  collection  tends  to  improve  the  latency 
or  response  time  of  garbage  collection  without  sacrificing  too  much  space. 

After  showing  how  a  monomorphic  language  can  be  garbage  collected  in  a  tag-free 
fashion,  I  show  how  a  type-passing,  polymorphic  language  such  as  A fL  can  be  garbage 
collected.  The  key  idea  is  to  use  constructors  that  are  passed  dynamically  as  arguments 
to  procedures  during  the  garbage  collection  process.  With  type  information  recorded 
at  compile  time,  this  allows  us  to  reconstruct  the  shape  of  all  objects.  Hence,  tag-free 
garbage  collection  is  another  mechanism  that  can  use  dynamic  type  dispatch  to  account 
for  variable  types.  As  for  monomorphic  languages,  this  approach  to  tag-free  garbage 
collection  for  polymorphic  languages  is  not  new  [119,  6,  96,  95],  but  my  formulation  is 
sufficiently  abstract  that  we  can  easily  prove  its  correctness. 

Tag-free  garbage  collection  is  important  for  two  very  practical  reasons:  first,  a  clever 
tag-free  implementation  can  avoid  manipulating  any  type  information  in  monomorphic 
code  at  run  time,  except  during  garbage  collection.  In  contrast,  a  tagging  implementa¬ 
tion  must  tag  values  as  they  are  created  and  possibly  untag  the  values  when  they  are 
examined.  The  overheads  of  manipulating  these  tags  during  the  computation  can  be 
considerable  [112]  and  implementors  go  to  great  lengths  and  use  many  clever  encodings 
to  minimize  these  overheads  [128].  Second,  tag-free  garbage  collection  supports  language 
and  system  interoperability.  In  particular,  many  ubiquitous  languages,  such  as  Fortran, 
C,  and  C++,  do  not  provide  automatic  memory  management  and,  thus,  do  not  tag  values. 
A  language  that  uses  tags  for  collection  must  strip  tags  off  values  before  passing  them 
to  library  routines  written  in  Fortran  or  C.  Similarly,  communicating  with  the  operating 
system,  windowing  system,  or  hardware  requires  matching  the  representations  dictated 
by  these  systems.  Since  tag-free  collection  places  no  constraints  on  the  representation  of 
values,  communicating  with  these  systems  is  easier  and  more  efficient. 
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(types)  r  ::=  int  |  float  |  unit  |  (rx  x  r2)  |  code(Ti,T2)  |  iq  — >  r2 

(expressions)  e  ::=  return  ./:  |  ifO  x  then  eq  elsee2  |  let  r:r  =  d  in  e 

(declarations)  d  ::=  x  \  i  \  f  |  ()  |  .  :r2)  |  7Ti  a?  |  7r2;r  | 

vcode [renv:Tenv, r:r].e  |  {(x1,x2))  \xx'  \ 
eqint(^i, x2)  |  eqf loat(rl5 x2) 

Figure  7.1:  Syntax  of  Mono-GC  Expressions 


7.1  Mono-GC 

In  this  Section,  I  define  a  language  called  Mono-GC,  which  is  derived  from  the  monomor- 
phic  subset  of  the  Af^-Close  language  (see  Chapter  6).  The  expressions  of  the  language 
are  limited  in  the  style  of  Flanagan  et  al.  [42],  In  particular,  the  language  forbids  nested 
expressions  and  requires  that  all  values  and  computations  be  bound  to  variables.  These 
restrictions  simplify  the  presentation  of  the  semantics,  but  they  provide  many  of  the 
practical  benefits  of  CPS  [9]. 

The  syntax  of  Mono-GC  is  defined  in  Figure  7.1.  Types  are  monomorphic  and  include 
base  types,  products,  code,  and  arrow  types.  To  simplify  the  language,  I  only  consider 
functions  of  one  argument.  Expressions  (e)  return  a  variable,  branch  on  a  variable,  or 
bind  a  declaration  (d)  to  a  variable  and  continue  with  some  expression.  Declarations 
can  either  be  immediate  constants,  a  primitive  operation  (e.g.,  eqint)  applied  to  some 
variables,  a  tuple  whose  components  are  variables,  a  piece  of  code,  a  closure,  a  projection 
from  a  variable,  or  an  application  of  a  variable  to  a  variable.  As  in  Mono-CLOSE,  I 
require  that  code  always  be  closed. 

The  declaration  let  x:r  =  d  in  e  binds  the  variable  x  within  the  scope  of  the  expres¬ 
sion  e.  Similarly,  the  declaration  vcode[xenv:renv, .r:r].e  binds  the  variables  ;renv  and  x  in 
the  scope  of  the  expression  e.  I  consider  expressions  to  be  equivalent  up  to  ^-conversion 
of  the  bound  variables. 

From  an  implementor’s  perspective,  the  variables  of  Mono-GC  correspond  to  abstract 
machine  registers.  In  contrast  to  a  semantics  based  on  substitution,  I  bind  values  to 
registers  and  then  compute  with  the  registers,  instead  of  substituting  the  contents  of  a 
register  within  a  term.  By  restricting  declarations  so  that  there  are  no  nested  expressions, 
I  greatly  simplify  the  process  of  breaking  an  expression  into  an  evaluation  context  and 
instruction.  Indeed,  the  evaluation  contexts  are  explicitly  tracked  via  a  program  stack 
(see  below). 
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The  other  syntactic  classes  of  Mono-GC  are  given  in  Figure  7.2.  Mono-GC  programs 
have  four  components:  a  heap  (H),  a  stack  ( S ),  a  typed  environment  (p),  and  an  expres¬ 
sion  (e).  Informally,  the  heap  holds  values  too  large  to  fit  into  registers.  The  stack  holds 
a  list  of  delayed  computations,  essentially  as  closures,  that  are  waiting  for  a  function 
invocation  to  return.  The  environment  serves  as  the  “registers”  of  the  abstract  machine 
and  maps  variables  to  small  values.  Finally,  the  expression  corresponds  to  the  code  that 
the  machine  is  currently  executing. 

Formally,  a  heap  is  an  unordered  set  of  bindings  that  maps  locations  (l)  to  heap  values. 
A  heap  value  (h)  is  a  tuple  of  small  values,  a  piece  of  code,  or  a  closure.  Heap  values  are 
too  large  to  fit  into  registers  and  are  thus  bound  to  locations  in  memory.  Small  values  ( v ) 
are  values  that  can  fit  into  registers  and  consist  of  integer  and  floating  point  constants, 
unit,  and  locations.  A  typed  environment  (p)  is  an  environment  that  maps  variables  to 
both  a  type  and  a  small  value.  I  use  p(x)  and  Ptype^)  to  denote  the  value  and  type  to 
which  p  maps  respectively.  A  stack  ( S )  is  a  list  of  pairs  of  the  form  [p,  A x:t.  e\.  Each 
pair  represents  a  delayed  computation  where  p  is  the  environment  of  the  computation 
and  A x:t.  e  is  the  “continuation”  of  the  delayed  computation.  I  require  that  all  the  free 
variables  in  A x:t.  e  be  bound  in  the  environment  p.  The  stack  could  also  be  represented  as 
a  list  of  closures,  but  this  approach  models  a  system  where  stack  frames  are  not  allocated 
in  the  heap.  Composing  the  “closures”  of  the  stack  results  in  the  “continuation”  of  the 
program.  Finally,  I  distinguish  programs  with  an  empty  stack  and  return  x  expression 
as  answer  programs. 

Like  the  expression  level,  I  consider  code-expressions  to  bind  their  variable  arguments 
within  the  scope  of  their  expression  components.  For  a  stack  frame  [p,  A x:r.  e],  I  consider 
the  domain  of  p  and  x  to  be  bound  within  e.  Finally,  I  consider  the  domain  of  the 
heap  to  bind  the  locations  within  the  scope  of  the  range  of  the  heap,  the  stack,  and  the 
environment  of  a  program.  Thus,  I  consider  programs  to  be  equivalent  up  to  cv-conversion 
and  reordering  of  the  locations  bound  in  the  heap. 

Considering  programs  equivalent  modulo  ct-conversion  and  the  treatment  of  heaps 
and  environments  as  sets  instead  of  sequences  hides  many  of  the  complexities  of  memory 
management.  In  particular,  programs  are  automatically  considered  equivalent  if  the 
heap  or  environment  is  rearranged  and  locations  or  variables  are  renamed  as  long  as  the 
“graph”  of  the  program  is  preserved.  This  abstraction  allows  us  to  focus  on  the  issues  of 
determining  what  bindings  in  the  heap  are  garbage  without  specifying  how  such  bindings 
are  represented  in  a  real  machine. 

7.1.1  Dynamic  Semantics  of  Mono-GC 

Figure  7.3  defines  the  rewriting  rules  for  Mono-GC  programs.  I  briefly  describe  each  rule 
below: 
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(locations) 

1 

(small  values) 

V 

■■■=  i\  /mo 

(heap  values) 

h 

::=  (vi,v2)  vcode[kenv:renv,  x:r].e  ((vcode,  vem)) 

(heaps) 

H 

.. —  {Zl — II  |  ,  ,  In — h'n  } 

(environments) 

P 

::=  {xp.T^v  •  •  • ,  xn:rn=vn} 

(stacks) 

S 

::=  []  |  S[p,  Xx:r.  e] 

(programs) 

P 

::=  (H,S,p,e) 

(answers) 

A 

::=  (H,  [],  p,  return  x) 

Figure  7.2:  Syntax  of  Mono-GC  Programs 


(1,2)  An  ifO  is  applied  to  some  variable  x.  We  find  the  value  of  x  in  the  current  envi¬ 
ronment  and  select  the  appropriate  expression  according  to  whether  this  value  is  0 
or  some  other  integer. 

(3)  A  variable  x  is  bound  to  another  variable  x'  in  a  let  expression.  We  lookup  the 
small  value  v  to  which  x'  is  bound  in  the  current  environment,  p.  We  extend  p  to 
map  x  to  v  and  continue  with  the  body  of  the  let. 

(4,5)  The  integer  equality  primitive  is  applied  to  two  variables.  We  find  the  value  of  the 
variables  in  the  environment  and  return  1  or  0  as  appropriate.  This  new  value  is 
bound  to  the  let-bound  variable  in  the  new  environment.  The  rules  for  floating¬ 
point  equality  (not  shown)  are  similar. 

(6)  The  expression  binds  a  variable  to  an  immediate  small  value.  We  add  that  binding 
to  the  current  environment  and  continue. 

(7,8,9)  The  expression  allocates  a  tuple,  code,  or  a  closure  binding  the  result  to  x.  We 
first  replace  all  of  the  free  variables  in  the  object  with  their  bindings  from  the 
environment.  Code  objects  are  always  closed,  so  they  are  not  effected.  Then,  we 
allocate  a  new  location  on  the  heap  l  and  bind  the  heap  value  to  this  location. 
Next,  we  map  x  to  l  in  the  current  environment  and  continue. 
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(10)  The  expression  projects  the  iLh  component  of  y,  binding  the  result  to  x.  We  lookup 
y  in  the  environment,  find  that  it  is  bound  to  the  location  1.  We  dereference  l  in 
the  heap  and  find  that  it  is  bound  to  a  heap  value  (vi,v2).  We  bind  u*  to  x  in  the 
environment  and  continue. 

(11)  The  expression  applies  xi  to  some  argument  x2,  binding  the  result  to  x.  We  lookup 
;ri  and  x2  in  the  environment,  finding  that  x,\  is  bound  to  some  location  l  and  x2 
is  bound  to  v.  We  dereference  l  in  the  heap  and  find  that  it  is  bound  to  a  closure, 
((4ode;  Venv));  where  lcoc te  is  bound  to  the  code  vcode[xebv:;renVr, y:r'].e.  We  form  a 
new  stack  frame  by  pairing  the  current  environment  ( p ')  and  the  current  expression, 
abstracting  the  result  of  the  function  A x:r.  e' .  This  frame  is  pushed  on  the  stack. 
Then,  we  install  the  environment  which  maps  xem  to  veuv  and  y  to  v.  We  then 
continue  with  the  body  of  the  closure  as  the  current  expression. 

(12)  The  current  expression  is  return  x'  and  the  stack  is  non-empty.  We  lookup  the 
value  of  x'  in  the  current  environment  (//),  pop  off  a  stack  frame,  install  its  environ¬ 
ment  (p)  as  the  current  environment,  bind  the  small  value  p'(x')  to  the  argument 
of  the  continuation  (:r),  and  continue  with  the  body  of  the  continuation. 

In  this  formulation,  each  time  a  let-expression  is  evaluated,  the  type  ascribed  to  the 
bound  variable  is  entered  into  the  current  environment  p,  as  well  as  the  value.  In  essence, 
the  environment  contains  a  type  assignment  T  that  is  constructed  on  the  fiv.  It  is  possible 
to  avoid  constructing  these  type  assignments  at  run  time  by  labelling  let-expressions  not 
only  with  the  type  of  the  bound  variable,  but  also  with  the  types  of  all  variables  in  scope. 
This  allows  evaluation  to  simply  discard  the  current  type  assignment  and  proceed  with 
the  assignment  labelling  the  expression.  Since  these  assignments  can  be  calculated  at 
compile  time,  no  assignment  construction  need  occur  at  run  time. 

Of  course,  labelling  each  expression  with  the  types  of  all  variables  in  scope  could 
take  a  great  deal  of  space.  But,  as  I  will  show,  this  type  information  is  only  used 
during  garbage  collection.  Most  language  implementations  restrict  garbage  collection 
from  occuring  except  at  certain  points  during  evaluation.  For  example,  the  garbage 
collector  of  SML/NJ  is  only  invoked  at  the  point  when  a  function  is  called,  or  at  the  point 
when  an  array  or  vector  is  allocated.  Hence,  we  only  need  to  record  type  assignments  for 
those  let-expressions  that  perform  a  function  call  or  array  allocation.  This  guarantees 
that  when  we  invoke  garbage  collection,  enough  type  information  is  present  to  do  the 
job. 

7.1.2  Static  Semantics  of  Mono-GC 

The  static  semantics  of  Mono-GC  is  described  via  a  family  of  judgments.  The  first  two 
judgments,  T  hexp  e  :  r  and  T  hdec  cl  :  r,  give  types  to  expressions  and  declarations, 
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1.  (if,  S,  p,  if  0  x  then  ex  else  e2)  i — >  (if,  S,  p,  ex) 


( P(x )  =  0) 


2.  (if,  5,  p,  if  0  a;  then  ex  else  e2)  i — >•  (if,  S,  p,  e2)(p(x)  =  i  and  i  ^  0) 

3.  (if,  S',  p,  let  x:t  =  x'  in  e)  i — >  (H,  S,  pG  {x:T=p(x')},  e) 


4.  (if,  S,  p,  let  x:t  =  eqint(a:1,  x2)  in  e) 

(if,  S.  p  l±J  {;r:r=l},  e) 

5.  (if,  S,  p,  let  x:r  =  eqint(:ci,  X2)  in  e) 

(if,  S,  p  l±J  {x:r=0},  e) 


{p{x  1)  =int  p{x2)) 


(p(x l)  T^int  P{x2  ) ) 


6.  (if,  S',  p,  let  x:t  =  v  in  e)  1 — )•  (if,  S,  p  l±l  {x:t=v},  e) 

7.  (H,  S,  p,let  x:t  =  (xi,x2)  in  e)  1 — > 

(if  t±l  {l  =  (p(x  1),  p(.r2))},  -S',  p  l±l  { x:r=l },  e) 

8.  (if,  S',  p,  let  :r:r  =  vcode[;renv:renv,  x:r'].e'  in  e)  1 — » 

(if  1+1  {/  =  vcode[:renv:Tenv,a::r,].e,},S,  pl+1  { x:r=l},e ) 

9.  (if,  S,  p,  let  r:r  =  ((:rcode,  ^W))  in  e)  1 — * 

(if  tt)  {/  =  ((p(xcode),  p(xenv ) ) ) } ,  S,  p  l+l  {x:t=1},  e) 

10.  (if,  S,  p,  let  x:t  =  tt,  y  in  e)  1 — )►  (if,  S,  p  tbl  {x:r=Vi},  e) 

where  p(y)  =  l  and  H(l)  =  («i,r2)  (1  <  z  <  2) 

11.  (if,  S,  p,  let  x:t  =  x%:x2  in  e')  1 — > 

(if,  S[p,  \x:t  .  6  ],  {^env^env  ^env?  y:r'=p{x  2)},e) 
where  p(x1)  =  l  and  H(l)  =  ((4ode,%nv))  and 
H{lc ode)  =  vcode[senv:renv, 

12.  (if,  S[p,  A.r:r.  c],  p',  return  r')  1 — >  (if,  S,  p  ttl  {x:r=p/(j:/)},  e) 


Figure  7.3:  Rewriting  Rules  for  Mono-GC 
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respectively,  in  the  context  of  a  variable  type  assignment  F.  These  judgments  are  de¬ 
rived  via  the  conventional  axioms  and  inference  rules  of  Figure  7.4,  ignoring  the  equality 
primitives. 

The  static  semantics  for  Mono-GC  programs  requires  six  more  judgments  that  are 
characterized  as  follows.  I  use  4/  to  range  over  location  type  assignments  which  map 
locations  to  types. 


4/  Kai  V  :  r 
4r  l~hval  h  :  T 
T  hheap  H  :  T 

T  henv  p  :  r 

41  F stack  S  :  T\  — >■  T2 

Fprog  P  '•  T~ 


v  is  a  well-formed  small  value  of  type  r 
h  is  a  well-formed  heap  value  of  type  r 
H  is  a  well-formed  heap  described  by  4/ 
p  is  a  well-formed  environment  described  by  I 
S'  is  a  well-formed  stack  of  type  iq  — >■  r2 
P  is  a  well-formed  program  of  type  r 


These  judgments  are  defined  by  the  inference  rules  and  axioms  of  Figure  7.5.  A  program 
has  type  r  if  the  following  requirements  are  met:  The  program’s  heap  can  be  described  by 
4>  under  no  assumptions  and  is  thus  closed.  The  program’s  stack  maps  r'  to  r  under  the 
assumptions  of  4/.  The  program’s  environment  is  described  by  F  under  the  assumptions 
of  4/.  Finally,  the  program’s  expression  has  the  type  r'  under  the  assumptions  of  T. 

A  heap  H  is  described  by  4/  under  the  assumptions  T'  if  for  all  locations  l  in  \k, 
H(l)  has  the  type  4/(1)  under  the  assumptions  of  both  4/  and  4A  This  circularity  in  the 
definition  allows  cycles  in  the  heap,  in  much  the  same  way  that  a  typing  rule  for  fix  allows 
a  circular  definition  of  a  recursive  function.  A  stack  has  type  T\  — >■  r2  if  the  composition 
of  its  closure  components  yields  a  function  from  values  to  r2  values.  A  closure  has 
type  t'  — y  t  if  its  environment  has  type  renv  and  its  code  has  type  code(renv,r/  — >  r). 


Lemma  7.1.1  (Extension)  If  (H,  S,  p,e)  i — »  (H1 ,  S' ,  p' ,  e1) ,  then: 

1.  H'  =  H  tU  H"  for  some  H" 

2.  if  S  =  S' ,  then  p'  =  p  l±l  p"  for  some  p” . 

Theorem  7.1.2  (Preservation)  If  hprog  P  :  r  and  P  \ — >  P' ,  then  hprog  P'  :  t. 

Theorem  7.1.3  (Canonical  Forms)  Suppose  hprog  (i7,  [],p,  return  x)  :  r.  Then  if  r 
is: 


int,  then  p(x)  =  i  for  some  i. 
float,  then  p{x)  =  /  for  some  f . 
unit,  then  p(x)  =  (). 
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Expressions: 


(var)  T  l±!  {x:t}  bvar  x  :  r 


(var-e) 


r  b  var  x  :  r 
T  bexp  return  x  :  r 


(ifO-e) 


r  b var  X  :  int  r  bexp  ex  :  r  F  bexp  e2  :  r 


(let-e) 


r  b exp  if  0  x  then  e\  else  e2  :  r 

r  b dec  cl  :  t  r  t±J  {x:r'}  bexp  e  :  r 
T  bexp  let  x:t'  <1  in  c. :  r 


Declarations: 


(tuple-d) 


(var-d)  T  l±l  {x:t}  bdec  x  :  r 

(float-d)  T  bdec  /  :  float 

r  var  X\  :ri  F  b  var  X'2  ■  r2 


(int-d)  T  bdec  i  :  int 

unit-d)  T  bdec  ()  :  unit 
.  ,,  r  b var  x  :  (n  X  t2) 


(proj-d) 


r  bdec  (x2,x2)  :  (ti  x  r2)  ^  T  bdec  X  '•  Ti 

,  {ienv-^env)  }  I- exp  0  •  7” 

vcode-d)  - - - T - 7 - 

r  bdec  vcode[®env:Tenv,  x:t  }.e  :  code(renv,  r  ->■  r) 

/  ,  f  I- var  :^code  •  Code(Tenv,  t)  F  bvar  dJenv  .  Tenv 

(close_d)  - FT — 77 - ; - 777 - 

1  b dec  \  V^code :  Xenv )  )  •  T 

,  ,,  r  bvar  x,\  \  t'  y  t  r  b  var  X  2  :  t' 

(app-d)  - — - : - 

r  I- dec  ^2  •  T 


(vcode-d) 


(close-d) 


(app-d) 


(1  <  i  <  2) 


Figure  7.4:  Static  Semantics  of  Mono-GC  Expressions 
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Values: 


(loc-v )  ’F  b  {hr}  b vai  l  :  r  (int-v)  <3;  bvai  i  :  int 

(float-v)  \F  bvai  /  :  float  (unit-v)  'F  bvai  ()  :  unit 


Heap  Values: 


(tuple-h) 


'F  b vai  v1  :  ri  'F  b val  v2  :  r2 


(code-h) 


(close-h) 


'F  bhvai  («1,U2)  :  (ri  x  r2) 

b dec  vcode [s^env :^enVj  :  cod e(renv,r/  ->■  r) 

'F  b hvai  vcode [£env:Tenv,2;:T,].e  :  code(renv,  r  r) 

'F  I- val  ^code  •  Code(renv,  t)  'F  bvaj  Uenv  •  ^env 

'If  ^hval  ((^codei  ^env))  •  T 


Heaps: 


(heap) 


VZ  G  Dom{^).m'  b  <F  bhval  #(/)  :  $(Z) 
bhpan  H  :  $ 


Environments: 


,  v  'F  Kai  G  :  Ti  •  •  •  ^  b val  vn  :  rn  , 

(env)  — - - - t — 7 - t  (*i,  •  •  • ,  xn  unique) 

w  b  env  {Xi:Ti=Ui,  •  •  •  ,  Xn\Tn=Vn}  \  {a^Ti,  •  •  •  ,  Xn\Tn  j 


Stacks: 


(push-stack) 


(empty-stack)  ^  bstack  []  :  r  -»■  r 

'F  Knack  S  :t2^t3  b env  p  :  T  T  b  {juti}  b  e  :  r2 

*F  b stack  S[p,  Yx\t\.  e]  :  ri  — >■  r3 


Programs: 


(prog) 


bheap  H  :  'F  *F  b  S  :  t1  ^  t 
'F  b env  p  :  r  r  b  e  :  t' 
Krog  {H,S,p,e):r 


Figure  7.5:  Static  Semantics  of  Mono-GC  Programs 
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•  (ti  x  r2),  then  p(x)  =  l  and  H(l )  =  (r>i,t>2)  for  some  l,  rq,  and  v2. 

•  code(renv,r1  — >■  r2),  then  p(x)  =  l  and  H(l )  =  vcode[xenv:renv, x\r-f\.e  for  some  l, 
xenv,  x,  and  e. 

•  ti  ->■  r2,  i/ien  p(x)  =  l,  H(l)  =  ((4ode,  hm)),  and 

H(lc ode)  =  vcode[  “®env  -Pav  5  x:Ti].e, 
foi  some  l ■  /Code».  /env*  *^env’  ^euv;  x  *  and  e. 

Theorem  7.1.4  (Progress)  If  b  P  :  r,  then  either  P  =  A  for  some  answer  A  or  else 
there  exists  some  P'  such  that  P  \ — >  P'. 


7.2  Abstract  Garbage  Collection 

Since  the  semantics  of  Mono-GC  makes  the  allocation  of  values  explicit,  I  can  define 
what  it  means  to  “garbage  collect”  a  value  in  the  heap.  A  binding  /  =  h  in  the  heap 
of  a  program  is  garbage  if  removing  the  binding  produces  an  “equivalent”  program.  To 
simplify  the  presentation,  I  will  focus  on  programs  that  return  an  integer  (i.e. ,  are  of  type 
int)  and  use  Kleene  equivalence  to  compare  programs. 

Definition  7.2.1  (Kleene  Equivalence)  P1  ~  P1  means  P1  JJ.  (Hi,  [],  pi,  return  sq) 

iff  P2  1)  {H2,  [],  p2,  return  x2)  and  p\(x\)  =  P2{x2)  =  i  for  some  integer  i. 

Definition  7.2.2  (Heap  Garbage)  If  P  =  (i?l±l  {l=h},  S,  p,  e),  and  bprog  P  :  int,  then 
the  binding  l=h  is  garbage  in  P  iff  P  re  ( H ,  S,  p,  e)  and  hprog  ( H ,  S,  p,  e)  :  int. 

A  collection  of  a  well-typed  program  P  is  obtained  by  dropping  a  (possibly  empty) 
set  of  garbage  bindings  from  the  heap  of  P,  resulting  in  a  well-typed  program  P'.  This 
definition  is  very  weak  in  that  it  only  allows  us  to  drop  bindings  in  the  heap  and  precludes 
other  program  transformations  including  modifications  to  the  stack,  environment,  or 
current  expression.  It  even  precludes  changing  a  heap  value  to  some  other  heap  value.  A 
garbage  collector  is  a  rewriting  rule  that  computes  a  collection  of  a  program. 

Many  garbage  collectors  attempt  to  collect  more  garbage  than  is  allowed  by  this  def¬ 
inition  by  modifying  the  stack,  the  environment,  or  values  in  the  heap  in  some  simple 
manner.  For  example,  many  collectors  drop  unneeded  bindings  in  the  current  environ¬ 
ment  or  environments  on  the  stack.  This  technique  is  known  as  “black-holing.”  Some 
very  few  collectors  reclaim  bindings  by  remapping  locations  from  one  heap  value  to  an 
already  existing,  equivalent  heap  value.  This  is  known  as  hash-consing  in  the  garbage 
collection  literature. 
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However,  the  definition  I  have  given  accurately  models  what  conventional  garbage 
collectors  try  to  do.  In  fact,  as  I  will  show,  the  family  of  tracing  garbage  collectors, 
typified  by  mark/sweep  and  copying  collectors,  have  the  nice  property  that  they  always 
collect  as  much  as  this  definition  of  garbage  allows,  but  no  more1.  Consequently,  tracing 
collectors  are  optimal  with  respect  to  this  definition  of  garbage. 

Abstractly,  tracing  collectors  simply  drop  bindings  in  the  heap  that  are  not  “reach- 
able”  from  the  stack  or  the  current  environment.  I  define  reachability  in  terms  of  the 
free  locations  of  program  components,  denoted  FL(— ): 

FLval(l)  =  {0 

FLV ai(n)  =  0  (v  =  i,  /,  or  ()) 

FLhval  ( («i ,  v2 ) )  =  FLV ai  ( v1 )  U  FLvai  ( v2 ) 

FLhvai(vcod.e[xenv:Tenv,x:r].e)  =  0 

F  Lkya\  (  (  (r;code  5  ^env)  )  )  —  FLV  al(uCode)  U  F  Lv al(^env) 

FLheap({li  =  hi,  •  •  ■  ,ln  =  hn})  =  ( U”_1  ( F Ahvai ( hj ) ) )  \  {l\y  •  •  • ,  ln} 

FLe  nv({“^l — Ai,}  ‘  ‘  ‘  :  —  V: n }  )  —  Uj —\F  Lva\(Vi] 

F ^stack(  [] )  =  0 

FLsiac]i(S[p,  Xx.t.  e] )  —  F Lstac]i(S)  U  F Lenv(p) 

FLprog(H ,  S,  p,  e )  =  ( FLheap(H )  U  FLstack(S)  U  FLenv(p ))  \  Dom(H) 

A  tracing  collector  is  any  collector  that  drops  bindings  in  the  heap  but  does  not  leave 
any  free  locations.  I  represent  this  specification  as  a  new  rewriting  rule  as  follows: 

Definition  7.2.3  (Tracing  Collector)  ( H  l±l  H' ,  S ,  p,  e )  *1— ><!  ( H ,  S ,  p,  e)  if  and  only  if 
FLprog{H ,  S.p.e)  0. 

I  must  show  that  tracing  garbage  collection  is  indeed  a  garbage  collector  in  that,  when 
given  a  program,  it  always  produces  a  Kleene-equivalent  program.  The  keys  to  a  simple, 
syntactic  proof  of  correctness  are  Postponement  and  Diamond  Lemmas.  The  statements 
of  these  lemmas  can  be  summarized  by  the  diagrams  of  Figure  7.6,  respectively,  where 
solid  arrows  denote  relations  that  are  assumed  to  exist  and  dashed  arrows  denote  relations 
that  can  be  derived  from  the  assumed  relations. 

Tn  an  untyped  setting  where  collections  are  not  required  to  be  closed  programs,  it  is  undecidable 
whether  or  not  a  given  binding  in  an  arbitrary  program  is  garbage  [96,  95].  This  more  general  notion  of 
garbage  can  be  recovered  in  the  typed  setting  by  allowing  locations  to  be  rebound  in  the  heap. 
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Figure  7.6:  Postponement  and  Diamond  Properties 


With  the  Postponement  and  Diamond  Lemmas,  it  is  straightforward  to  show  that 
tracing  garbage  collection  does  not  affect  evaluation. 

Theorem  7.2.6  (Correctness  of  Tracing  Collection)  If  P  pf  then  p'  i s  a 

collection  of  P. 

Proof:  Let  P  =  (Hi  l±l  H2,S,p,e)  and  let  P'  =  (Hi,  S,  p,e)  such  that  P  *1— ><!  P' .  I 

must  show  P  evaluates  to  an  integer  answer  iff  P'  evaluates  to  the  same  integer.  Suppose 
P'  1)  (H,  [],//,  return  x )  and  p'(x )  =  i.  By  induction  on  the  number  of  rewriting  steps 
using  the  Postponement  Lemma,  I  can  show  that  P  JJ.  (H  l±l  H2,  [],  p7,  return  x),  and 
clearly  p'(x)  =  i. 

Now  suppose  P  JJ-  ( H ,  [],  pf ,  x)  and  p'(x )  =  i.  By  induction  on  the  number  of  rewriting 
steps  using  the  Diamond  Lemma,  we  know  that  there  exists  a  P"  such  that  P'  -I)  P"  and 
and  P"  (H,  [],  pf,  return  x).  Thus,  P"  =  (HOH’,  [],  //,  return  x)  and  p'(x)  =  i  and 
both  P  and  P'  compute  the  same  answer.  □ 

This  theorem  shows  that  a  single  application  of  tracing  collection  results  in  a  Ivleene- 
equivalent  program.  A  real  implementation  interleaves  garbage  collection  with  evalua¬ 
tion.  Let  R  stand  for  the  standard  set  of  rewriting  rules  (see  Figure  7.3)  and  let  T  stand 
for  this  set  with  the  tracing  garbage  collection  rule.  The  following  theorem  shows  that 
evaluation  under  R  and  T  is  equivalent. 
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Theorem  7.2.7  For  all  P,  P  Jj-#  ( H ,  [],  p,  return  x )  iff  P  ( H ' ,  [],  p,  return  x ). 


Proof:  Clearly  any  evaluation  under  the  R  rules  can  be  simulated  by  the  T  rules 

simply  by  not  performing  any  collection  steps.  Now  suppose  P  JJ-t  (Hi,  [],  pi,  return  aq) 
and  pi  (:ri)  =  i.  Then  there  exists  a  finite  rewriting  sequence  using  T  as  follows: 

T  T  T  T 

P  i — )•  Pi  i — >  P2  1 — >  •  •  •  1 — >  (Hi,  [],  pi,  return  aq) 


I  can  show  by  induction  on  the  number  of  rewriting  steps  in  this  sequence,  using  the 
Postponement  Lemma,  that  all  garbage  collection  steps  can  be  performed  at  the  end  of 
the  evaluation  sequence.  This  provides  us  with  an  alternative  evaluation  sequence  where 
all  the  R  steps  are  performed  at  the  beginning: 


PRi  .1  /  R  7—,  /  R  R 

1 — >  P\  1 — >  P2  1 — >  ■  ■  ■  1 — ) 

trace  „  trace 
Pn+i  1 — >  Pn+ 2  1 — >  ■  ■  ■ 


trace 

K  ^ 

trace  [];  pu  return  1 ) 


Since  collection  does  not  affect  the  expression  part  of  a  program  and  only  removes  bind¬ 
ings  from  the  heap,  P'n  =  (Hi  i+J  H2,  [],pi,  return  xi)  for  some  H2.  Therefore,  P  \fR  P'n. 
Thus,  any  evaluation  under  T  can  be  simulated  by  an  evaluation  under  R.  □ 

Finally,  I  can  prove  that  tracing  garbage  collection  is  optimal  with  respect  to  my 
definition  of  garbage  in  the  sense  that  it  can  collect  as  much  garbage  as  any  other  collector 
can. 


Theorem  7.2.8  (Tracing  Collection  Optimal)  If  P1  is  a  collection  of  a  well-typed 

t-,  ,1  n  trace 
program  P,  then  P  1 — >  P  . 

Proof:  Let  P  =  (H  l±l  H' ,  S,  p,  e)  and  suppose  P'  =  ( H ,  S,  p,  e)  is  a  collection  of  P.  By 
the  definition  of  heap  garbage,  P'  is  well-typed.  Hence,  P'  is  closed  and  FLprog(P')  =  0. 

Thus  P  P'.  □ 


7.3  Type-Directed  Garbage  Collection 

In  the  previous  section,  I  gave  a  specification  for  tracing  garbage  collection  as  a  rewriting 
rule  and  showed  that  this  new  rewriting  rule  did  not  effect  a  program’s  evaluation.  How¬ 
ever,  the  rewriting  rule  is  simply  a  specification  and  not  an  algorithm  for  computing  a 
collection  of  a  program.  It  assumes  some  mechanism  for  partitioning  the  set  of  bindings 
in  the  heap  into  two  disjoint  pieces,  such  that  one  set  of  bindings  is  unreachable  from  the 
second  set  of  bindings  and  the  rest  of  the  program.  Real  garbage  collection  algorithms 
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need  a  deterministic  mechanism  for  generating  this  partitioning.  In  this  section  I  formu¬ 
late  an  abstract  version  of  such  a  mechanism,  the  tracing  garbage  collection  algorithm, 
by  lifting  the  ideas  of  mark/sweep  and  copying  collectors  to  the  level  of  program  syntax. 

The  basic  idea  behind  an  algorithm  for  tracing  garbage  collectors  is  to  calculate  the 
set  of  locations  accessible  from  the  current  context  (i.e. ,  the  environment  and  stack). 
These  locations  must  be  preserved  to  keep  the  program  closed.  Next,  for  each  location 
that  has  been  preserved,  we  examine  the  heap  value  to  which  this  location  is  bound. 
Each  location  within  this  heap  value  must  also  be  preserved.  We  iterate  this  process 
until  all  locations  in  the  heap  have  been  classified  as  accessible  or  inaccessible. 

How  do  we  calculate  the  set  of  locations  within  an  environment  or  stack  or  heap 
value?  One  approach  is  to  deconstruct  the  object  in  question  based  on  its  abstract 
syntax  and  simply  find  all  of  the  objects.  However,  this  requires  that  the  distinctions 
between  syntactic  classes  remain  apparent  at  runtime.  That  is,  we  must  be  able  to  tell 
l  objects  from  /  and  i  and  ()  objects  and  we  must  be  able  to  break  tuples  and  closures 
into  their  components,  determine  which  components  are  l  objects,  etc.  Fundamentally, 
this  is  a  parsing  problem:  To  deconstruct  an  object  to  find  its  location  components,  we 
must  leave  enough  markers  or  tags  in  the  representation  of  objects  to  determine  what 
the  components  of  the  object  are. 

Tagging  objects  directly  is  unattractive  because  it  can  cost  both  space  and  time  dur¬ 
ing  computation.  For  example,  if  we  tag  integer  and  floating  point  values  so  that  we 
can  tell  them  from  locations,  then  we  can  no  longer  directly  use  the  machine’s  primitive 
operations,  such  as  addition,  to  implement  our  primitive  integer  and  floating  point  op¬ 
erations.  Instead,  we  must  strip  the  tag(s)  from  the  value,  apply  the  machine  operation, 
and  then  tag  the  result. 

An  alternative  approach  to  tagging  values  is  to  use  types  to  guide  the  process  of 
finding  the  locations  in  an  object.  In  particular,  by  the  Canonical  Forms  Lemma,  we 
know  that  if  we  have  an  answer  of  the  form  ( H ,  S ,  p,  return  x)  of  type  (t\  x  t2),  then  x 
is  bound  to  some  location  l  in  p.  Furthermore,  we  know  that  H(l)  is  defined  and  is  of 
the  form  (iq,^).  If  r*  is  a  tuple  type,  code  type,  or  arrow  type,  then  we  know  that  u* 
is  a  location.  In  this  fashion,  given  the  type  of  an  object  in  the  heap,  we  can  extract  all 
the  locations  in  that  object. 

Extracting  locations  from  closures  requires  a  bit  more  cooperation  from  the  imple¬ 
mentation.  In  particular,  we  must  assume  that  all  closures  provide  sufficient  information 
for  finding  their  environment  components  and  the  type  of  the  environment.  However, 
once  this  information  is  in  hand,  we  can  use  the  type  of  the  environment  argument  of 
the  code  to  determine  all  of  the  locations  that  are  in  the  closure’s  environment. 

I  formalize  the  process  of  extracting  locations  based  on  types  as  follows.  First,  I  define 
a  subset  of  types  corresponding  to  heap  values: 

(pointer  types)  </>  ::=  (ti  x  t2)  |  code(renv,r)  |  T\  — >  r2 
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Next,  I  construct  a  partial  function,  T L|lva] ,  that  maps  a  location,  a  pointer  type,  and  a 
heap  to  a  location  type  assignment.  I  use  TL  to  remind  the  reader  that  we  are  extracting 
a  set  of  Typed  Locations.  Since  code  heap  values  never  have  free  locations,  the  definition 
of  T Ljjyai  at  code  types  is  simply  the  empty  type  assignment: 

TLhvai[/:code(renv,  t),H]  =  0 

The  free  locations  of  a  tuple  are  those  components  whose  types  are  pointer  types. 

T Lhvai [/: (</>i  x  (f>2),  H  (±)  {l=(vi,  u2)}]  =  U  {v2:<j)2} 

TLhval[l:((j)i  x  t2),HC  {l=(v1:v2)}]  = 

TLhvai[/:(ri  x  (j)2),H\£  {/=(ui,  v2)}\  =  {v2:<j)2} 

TLhvai[/:(ri  x  t2),HC  {I  (ci.r2)}  =  0 

The  free  locations  of  a  closure  include  the  location  bound  to  the  code  and  possibly  the 
environment  value.  To  determine  if  the  environment  is  a  location  and  to  determine  its 
type,  we  must  look  at  the  type  ascribed  to  the  environment  argument  of  the  code.  If 
this  type  is  a  pointer  type  (j).  then  the  environment  component  of  the  closure  must  be  a 
location  whose  contents  are  described  by  <j). 


TLhvai[/:ri  ->■  t2,H]  =  {lcode:code{(f),  tx  ->■  r2),®env:^} 

when  H(l)  =  {(lco de,venv)) 

and  H(lc ode)  =  vcode[a:env:0, xir^.e. 

Otherwise,  if  the  type  of  the  environment  argument  is  not  a  pointer  type,  then  the 
environment  of  the  closure  is  not  a  location  and  thus  only  the  code  pointer  is  in  the 
resulting  type  assignment: 


TLhvai[/:ri  ->■  t2,H ]  =  {lcode:code{Tenv, n  ->■  r2)} 

when  H(l)  =  ((/code,  veiw)) 
and  H(lcode)  =  vcode[^env:renv, 

The  following  lemma  shows  that  TLhvad[l:'Sf(l),  H]  is  always  defined  and  consistent 
with  whenever  T  describes  the  contents  of  the  heap  H.  Thus,  if  we  know  the  type 
of  some  location,  then  we  can  always  extract  the  pointers  contained  in  the  heap  value 
bound  to  that  location. 

Lemma  7.3.1  (Canonical  Heap  Values)  If  bheap  H  :  Ti±l{/:r},  then  TLhva\[l:r,  H]  = 
T /,  for  some  T /,  and,  T/j  C  T  i+i  {l:r}. 
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Proof  (sketch):  By  examination  of  the  heap,  heap  value,  and  small  value  typing  rules. 

□ 

The  TLhv ai  function  provides  the  functionality  that  we  need  to  keep  a  garbage  collector 
running.  All  that  remains  is  to  extract  the  locations  and  their  types  from  the  stack  and 
environment  of  a  program. 

TLenv{x1:r1=v1,  •  •  • ,  xn\Tn=vn}\  =  U ^{vp.Ti  \  3<f>.Ti  =  </>} 

TLstack(  [])  =  0 

TLstack(S[p,  Xx:t.  e])  =  TLstack(S)  U  TLenv(p) 

With  these  functions  in  hand,  I  can  now  construct  an  algorithm  that  finds  all  of  the 
locations  that  must  be  preserved  in  a  program.  The  algorithm  is  formulated  as  a  rewriting 
system  between  triples  consisting  of  a  heap,  a  location  type  assignment,  and  another  heap, 
In  traditional  garbage  collection  terminology,  the  first  heap  is  termed  the 
“from-heap”  or  “from-space” ,  the  location  type  assignment  is  called  the  “scan-set”  or 
“frontier” ,  and  the  second  heap  is  called  the  “to-heap”  or  “to-space” . 

Initially,  the  from-space  contains  all  of  the  bindings  from  the  program’s  heap;  when 
the  algorithm  terminates,  it  contains  those  bindings  that  do  not  need  to  be  preserved. 
Correspondingly,  the  to-space  is  initially  empty;  when  the  algorithm  terminates,  it  con¬ 
tains  all  of  the  bindings  that  must  be  preserved.  During  each  step  of  the  algorithm, 
the  scan-set  contains  the  locations  and  their  types  that  are  bound  in  the  from-space  but 
are  immediately  reachable  from  the  to-space.  The  scan-set  is  initialized  by  finding  the 
locations  in  the  current  environment  and  stack. 

The  body  of  the  algorithm  proceeds  as  follows:  a  location  l  of  type  <f>  is  removed  from 
<1>.  If  l  is  bound  in  the  from-space  to  a  heap  value,  then  we  use  T  Lkvai[l:(f>,  H]  to  extract 
the  locations  contained  in  H(l),  where  H  is  the  union  of  the  from-  and  to-spaces.  For 
each  such  location  l',  we  check  to  see  if  l'  has  already  been  forwarded  to  the  to-set  Ht. 
Only  if  l'  is  not  bound  in  Ht  do  we  add  the  location  and  its  type  to  the  scan-set  T. 
This  ensures  that  a  variable  moves  at  most  once  from  the  from-space  to  the  scan-set.  I 
formalize  this  process  via  the  following  rewriting  rule: 

(Hf  W  {l=h},  0  {/:</>},  Ht)  =y  (Hf,  U  T's,  Ht  0  {l=h}) 

where  ^'s  =  {l1:#  e  TLhval[l:q i,  Hf  W  {l=h}  0  Ht\  \  l'  $  Dom{Ht)  0  {/}} 

Once  the  scan-set  becomes  empty,  the  algorithm  terminates  and  the  to-space  is  taken  as 
the  new,  garbage-collected  heap  of  the  program,  while  the  from-space  is  discarded.  The 
initialization  and  finalization  steps  are  captured  by  the  following  inference  rule: 

(H,  TLenv(p)  U  TLstack(S),  0)  (Hf,  0,  Ht) 

{ H,S,p,e )  ^(HuS.  p.e) 
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To  prove  the  correctness  of  this  garbage  collection  algorithm,  it  suffices  to  show  that, 

whenever  P  p> _  then  P  pf  since  I  have  already  shown  that  the  tracing  garbage 

collection  specification  is  correct.  However,  to  ensure  that  we  have  a  proper  algorithm,  I 

tr~  cilsr 

must  also  show  that  there  always  exists  some  P'  such  that  P  i — -r  P' .  That  is,  I  must 
show  that  the  algorithm  does  not  get  stuck. 

I  begin  by  establishing  a  set  of  invariants,  with  respect  to  the  original  program,  that 
are  to  be  maintained  by  the  algorithm.  I  note  that  if  a  program  P  =  ( H ,  S ,  p,  e)  is  well- 
typed,  then  there  is  a  unique  location  type  assignment  Tp,  variable  type  assignment  Tp, 
and  unique  types  Tp  and  t'p  such  that:  0  bheap  H  :  Tp,  Tp  hstack  T'p  — >•  Tp,  Tp  henv  p  :  Tp 
and  Tp  bexp  e  :  t'p.  Hence,  for  a  well-typed  P,  I  write  Tp,  Tp,  Tp,  and  t'p  to  represent 
these  respective  objects. 


Definition  7.3.2  (Well-Formedness)  Let  P  =  ( H ,  S,  p,  e )  be  a  well-typed,  program. 
The  tuple  (Hf,  T.s,  Hf)  is  well-formed  with  respect  to  P  iff,  taking  Tt  =  (Z:Tp(/)  |  l  € 
Dom(Ht)}  and  T j  =  {Z:TP(Z)  |  l  e  Dom(Hf)}: 


1. 

Hf 

W 

Ht 

=  H 

2. 

T.s 

C 

3. 

T.s 

W 

T  t 

T stack  S  . 

1 

T.s 

w 

Tt 

I“env  P  :  T 

5. 

T.s 

^heap 

Ht  :  Tt. 

Roughly  speaking,  the  invariants  ensure  that:  (1)  all  of  the  heap  bindings  are  ac¬ 
counted  for  in  either  the  from-space  or  the  to-space  and  these  two  spaces  are  disjoint,  (2) 
the  scan-set  types  some  of  the  locations  in  the  from-space  and  these  types  are  consistent 
with  the  rest  of  the  program,  (3)  the  scan-set  coupled  with  the  types  of  locations  in  the 
to-space  allow  us  to  type  the  stack  appropriately,  (4)  the  scan-set  coupled  with  the  types 
of  locations  in  the  to-space  allow  us  to  type  the  environment  appropriately,  and  (5)  the 
scan-set  allows  us  to  type  the  to-space  appropriately.  The  following  lemma  shows  that 
these  invariants  are  preserved  by  the  algorithm. 

Lemma  7.3.3  (Preservation)  If  bprog  P  :  Tp,  (Hf,tys,Ht)  is  well-formed  with  respect 
to  P  mid  (Hf,  Ts,ZZf)  =r  (Hf,  T',  H[),  then  (H'f,  T(,  Hf)  is  well-formed  with  respect  to 

P. 


Proof:  Suppose  (Hf  i+J  {7=/i},Ts  l±l  {l:r},Ht)  is  well-formed  with  respect  to  P  = 

(H,  S,  p,  e )  and  suppose  (Hf  l±l  {l=h},  Ts  l±)  {Z:r},  Ht)  =3-  (Hf,  Ts  l±l  T' ,  Ht  l±)  {l=h})  where 
T'  =  {1':t'  e  TLhval[Z:r ,  Hf  W  {l=h}  W  Ht\  \  l'  <?  Dom(Ht )}.  ’ 
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By  condition  (1),  H  =  ( Hf  l+J  {l=h})  l±l  Ht  thus,  H  =  HfC  (Hf  l±l  {l=h}). 

By  condition  (2),  we  know  that  l:r  G  Tp  and  Ts  C  Tp.  By  the  Canonical  Heap 
Values  Lemma,  T  Lhva\[r](h)  C  Tp.  Hence,  T'  C  Ts  l±l  T'  C  Tp. 

By  conditions  (3)  and  (4),  taking  4/  =  (Ts  l±l  {/:r})  l±l  Tt,  we  know  that  $  Pstac,k  S  : 
t'p  — >■  Tp  and  4/  henv  p  :  Lp.  By  condition  (2)  we  know  that  l:r  G  4/p.  Thus,  Tt  l±)  {l:r} 
is  well-formed  and  taking  4/'  =  (T.s  U  T'J  l±l  (T4  l±l  we  have  4/'  Lstack  S  :  t'p  — >■  tp 

and  T'  henv  p  :  TP. 

By  condition  (5)  ,  Tst±l{/:r}  h heap  Ht  :  Tt.  I  must  show  (TSUT'S)  hheap  HtC{l=h)  :  T^ 
where  T-f  i±l  By  the  Canonical  Heap  Values  Lemma,  TL)lvai  [1:t.  H]  hhvai  h  :  r,  where 

H  =  Hf  l±J  {/=/r}  l+J  Ht.  Hence,  (4/s  U  T'J  l+J  hhvai  h  :  r.  Thus,  by  the  heap  typing  rule, 
(**  U  4d)  hheap  Ht  C  {l=h}  :  C  {l:r}.  □ 

The  following  lemma  shows  that  at  each  step  in  the  algorithm,  either  the  scan-set  is 
empty  —  in  which  case  the  algorithm  is  finished  —  or  else  the  algorithm  can  step  to  a 
new  state. 

Lemma  7.3.4  (Progress)  If  hprog  P  :  tp  and  (Hf,^>s,Ht)  is  well-formed  with  respect 
to  P,  then  either  4/s  is  empty  or  else  ( Hf ,  4/s,  Ht)  =>  (Hj,  4/'s,  Hf)  for  some  (Hf,  4/(,  Hf). 

Proof:  Suppose  contains  the  binding  l:r  and  let  H  =  Hf  l+J  Ht.  First,  I  must  show 

that  l  is  bound  to  some  heap  value  in  Hj  and  that  heap  value  has  a  shape  described  by 
r.  By  the  second  requirement  of  well-formedness  and  the  definition  of  4//,  we  know  that 
4 ’ f(l)  =  4 ~>p(l)  =  r.  Since  h^p  H  :  Tp,  by  the  Canonical  Heap  Values  Lemma,  we  know 
that  H(l)  =  h  for  some  h.  T Lhva\[l:r,  H]  =  4 th,  and  4 th  C  Tp.  By  the  definition  of  Ty 
and  the  first  requirement  of  well-formedness,  the  binding  l=h  must  be  in  Hf. 

Taking  T"  =  {l':r'  G  T h  \  l'  $  Dom(Ht )},  I  must  now  show  that  (T.s  \  {/:r})  U  T" 
is  a  valid  location  type  assignment.  But  since  T/,,  C  Tp,  then  T"  C  Tp.  By  the  second 
requirement  of  well-formedness,  TsCTjC  Tp,  so  (Ts  \  {/:r})  U  T"  is  well  formed  as  a 
subset  of  Tp. 

Finally,  since  Hf  and  Ht  are  disjoint  but  cover  H ,  l  cannot  be  bound  in  Ht  and  hence 
Ht  l+J  {l=h}  is  well-formed.  Thus,  taking  Hf  =  Hf  \  {l=h},  T'  =  Ts  \  {l:r}  U  T",  and 
H't  =  Ht  tt)  {l=h},  we  know  that  (Hf,  T.s,  Ht)  =G  (Hf,  T(,  Hf).  □ 

With  the  Preservation  and  Progress  Lemmas  in  hand,  I  can  establish  the  correctness 
of  the  algorithm. 

Theorem  7.3.5  (Tracing  Algorithm  Correctness)  If  P  is  well-typed,  then  there  ex¬ 
ists  a  P'  such  that  P  p>  anfi  p  ~  p1 

Proof:  Let  P  =  (H,  S,  p,e).  First,  I  must  show  that  ( H,TLenv(p )  U  TLstack(S'),  0) 

exists  and  is  well-formed  with  respect  to  P.  Since  the  to-space  is  empty,  conditions  one 
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and  five  of  well-formedness  are  trivially  satisfied.  By  the  env  typing  rule,  it  is  clear  that 
TLenv(p)  exists  and  is  a  subset  of  4/p.  By  the  stack  typing  rules  and  the  env  typing  rule, 
it  is  clear  that  TLstack(S )  exists  and  is  a  subset  of  4<p.  Hence,  TLenv(p)  U  TLstack(S')  is 
a  well-formed  location  type  assignment  that  is  a  subset  of  4>p.  Furthermore,  it  is  clear 
that  all  of  the  free  locations  in  both  the  stack  and  environment  are  contained  in  this 
location  type  assignment.  Hence,  conditions  two,  three,  and  four  are  satisfied  and  the 
initial  tuple  is  well-formed  with  respect  to  P. 

Since  ( H,TLenv(p )  U  TLstack(S),  0)  is  well-formed,  by  progress,  the  algorithm  will 
continue  to  run  until  the  scan-set  is  empty,  at  which  point  the  algorithm  terminates 
in  the  state  (Hf,$,Ht).  By  preservation,  we  know  that  this  tuple  is  well-formed  with 
respect  to  P.  Hence,  we  know  that,  taking  4b  =  {l:r  \  l  E  4/p},  4b  bstack  S  :  [r'P]  — >■  rP, 
4b  I- env  P  ■  Tp,  and  hheap  Hf  :  4b-  Consequently,  the  program  P'  =  (HuS,p,e)  is 

closed  and  thus  P  P'.  By  the  correctness  of  the  tracing  specification,  we  know  that 
P  ~  P' .  Thus,  P'  is  a  collection  of  P.  □ 

Finally,  the  tracing  algorithm  that  I  have  presented  is  optimal  with  respect  to  my 
definition  of  garbage. 

Theorem  7.3.6  Let  P  \ — -r  (Hi,  S,  p,e).  If  (H2,  S,  p,  e)  is  a  collection  of  P  then  Hi  C 

H>. 

Proof:  By  theorem  7.2.8,  it  suffices  to  show  that  if  P  p'  where  P'  =  (H2,  S ,  p,  e), 
then  Hi  C  H>.  Let  l=h  be  a  binding  in  Hi.  I  must  show  that  this  binding  is  also  in 
H2.  I  do  so  by  analyzing  how  l  is  placed  in  the  scan  set  and  hence  forwarded  from  the 
from-space  to  the  to-space  during  the  execution  of  the  tracing  algorithm. 

If  l  is  placed  in  the  initial  scan-set,  then  l  occurs  free  in  either  the  range  of  p  or  else 
the  range  of  the  environment  of  some  closure  on  the  stack.  Hence,  l  must  be  bound  in 
H2  to  keep  P'  closed. 

Suppose  l  is  is  placed  in  the  scan-set  because  it  is  found  via  TLhvai[r'](/i)  for  some 
1'\t'  already  in  the  scan-set.  The  binding,  l'=h'  is  forwarded  to  the  to-space.  Therefore, 
h'  is  in  the  range  of  H2.  But  then  FLhva\(h')  contains  l.  Thus,  for  P'  to  be  closed,  H2 
must  contain  the  binding  for  l.  □ 


7.4  Generational  Collection 

The  tracing  garbage  collection  algorithm  I  presented  in  the  previous  section  examines 
all  of  the  reachable  bindings  in  the  heap  to  determine  that  the  rest  of  the  bindings  may 
be  removed.  By  carefully  partitioning  the  heap  into  smaller  heaps,  a  garbage  collector 
can  scan  less  than  the  whole  heap  and  still  free  significant  amounts  of  memory.  A 
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generational  partition  of  a  program’s  heap  is  a  sequence  of  sub-heaps  ordered  in  such  a 
way  that  “older”  generations  never  have  pointers  to  “younger”  generations. 

Definition  7.4.1  (Generational  Partition)  A  generational  partition  of  a  heap  H  is 
a  sequence  of  heaps  H\,  II-,,  Hn  such  that  H  =  Hi'S  H2^S  ■  •  •  l±)  Hn  and  for  all  i  such 
that  1  <  i  <  n,  FLkeap{Hi )  D  Dom(Hj+ 1  l±l  if,;+2  hi  •  •  •  l±l  Hn)  =  0.  The  Hi  are  referred  to 
as  generations  and  Hi  is  said  to  be  an  older  generation  than  Hj  if  i  <  j. 

Given  a  generational  partition  of  a  program’s  heap,  a  tracing  garbage  collector  can 
eliminate  a  set  of  bindings  in  younger  generations  without  looking  at  any  older  genera¬ 
tions. 

Theorem  7.4.2  (Generational  Collection)  Let  H) ,  . ...  if ,  Hn  be  a  generational 
partition  of  the  heap  of  P  =  (H,  S,  p,e).  Suppose  Hi  =  (Hj  ttl  H f)  and  Dom(Hf)  fl 

FLprog(H f  h  11, . ,  d  •  •  •  d  Hn,  S ,  p:T,  e )  =  0.  Then  P  W  (H  \  Hf ,  p,  e). 

Proof:  I  must  show  that  Dom(Hf )  fl  FLprog(H  \  Hf,  S,  p,  e)  =  0.  Since  Hi,  ■  ■  -■ ,  Hn  is  a 
generational  partition  of  H ,  for  all  j ,  1  <  j  <  i,  FLheap(Hj)  fl  Dom(Hj+i  d  •  •  •  d  Hn)  =  0. 
Hence,  FLkeap(Hi  d  •  •  •  d  if7_i)  fl  Dom(Hf)  =  0.  Now, 

FLprog(H\Hf,S,p,e)nDom(Hf) 

=  (FLheap(H  \  Hf)  d  FLstack(S)  U  FLem(p)  d  FLexp(e))  fl  Dom(Hf) 

=  {F Lheap(Hi  d  •  •  •  d  H  i)  d  FLheap(Hf  d  •  •  •  d  Hn) d 
FLstack{S)  d  FLenv(p)  d  FLex p(e))  fl  Dom(Hf) 

=  (FLheapli/x  d  •  •  •  d  Hi _ x )  n  Dom(Hf)) d 

(( FLheap{Hf  d  •  •  •  d  Hn)  d  FLstack(S)  d  FLenv(p)  d  FLexp(e))  n  Dom(Hf)) 

=  0  d  (( FLheap{Hf  d  •  •  •  d  Hn)  d  FLstack{S)  d  FLenv{p )  d  FLexp(e))  fl  Dom(Hf)) 

=  FLpvog(Hf  d  •  •  •  d  Hn,S,  p ,  e)  fl  Dom{Hf ) 

=  0 


□ 

Generational  collection  is  important  for  three  practical  reasons:  first,  evaluation  of 
programs  makes  it  easy  to  maintain  generational  partitions. 

Theorem  7.4.3  (Generational  Preservation)  Let  P  =  (if,  S,  p,  e)  be  a  well-typed 
program.  If  Hi, . . . ,  Hn  is  a  generational  partition  of  H  and  P  \ — >  (H  d  if7,  S,  p,  e),  then 
Hi, ... ,  Hn ,  H'  is  a  generational  partition  of  H  d  if7. 

Proof:  The  only  evaluation  rules  that  modify  the  heap  are  the  rules  that  allocate  tuples 
and  closures.  The  other  rules  leave  the  heap  intact  and  hence  preserve  the  partition 
trivially.  Since  the  allocation  rules  only  add  a  binding  to  the  heap  and  do  not  modify  the 
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rest  of  the  heap,  all  I  must  show  is  that  there  are  no  references  in  the  older  generations  to 
this  new  location.  But  this  must  be  true  since  a  new  location  is  chosen  for  the  allocated 
heap  value.  □ 

Clearly,  the  addition  of  certain  language  features  such  as  assignment  or  memoization 
breaks  the  Generational  Preservation  Theorem.  The  problem  with  these  features  is  that 
bindings  in  the  heap  can  be  updated  so  that  a  heap  value  in  an  older  generation  contains 
a  reference  to  a  location  in  a  younger  generation.  It  is  possible  to  maintain  a  generational 
partition  for  such  languages  by  keeping  track  of  all  older  bindings  that  are  updated  and 
by  moving  them  from  the  older  generation  to  a  younger  generation.  The  mechanism  that 
tracks  updates  to  older  generations  is  called  a  write  barrier.  Wilson’s  overview  provides 
many  examples  of  techniques  used  to  implement  write  barriers  [128]. 

The  second  reason  generational  collection  is  important  is  that,  given  a  generational 
partition,  we  can  directly  use  the  tracing  collection  algorithm  to  generate  a  generational 
collection  of  a  program. 

The  following  generational  collection  rule  starts  simply  forwards  the  entire  older  gen¬ 
eration  at  the  beginning  of  the  algorithm  and  then  processes  the  younger  generation. 

H] :  H>  a  generational  partition 

(H2,  {hr  €  TLstack(S)  U  TLem(p)  \  l  $  Dom(Hi)},  Hi)  =»  ( Hf ,  0,  Hi  l±i  if' ) 

{Hi  id  H2,  S ,  p,  e )  ge^4lg  {Hi  d  H!2.  S ,  p,  e ) 

The  rule’s  soundness  follows  directly  from  the  Generational  Collection  Theorem,  as  well 
as  the  soundness  of  the  tracing  collection  algorithm. 

The  third  reason  generational  collection  is  important  is  that  empirical  evidence  shows 
that  “objects  tend  to  die  young”  [120].  That  is,  recently  allocated  bindings  are  more  likely 
to  become  garbage  in  a  small  number  of  evaluation  steps.  Thus,  if  we  place  recently 
allocated  bindings  in  younger  generations,  we  can  concentrate  our  collection  efforts  on 
these  generations,  ignoring  older  generations,  and  still  eliminate  most  of  the  garbage. 


7.5  Polymorphic  Tag- Free  Garbage  Collection 

In  this  section,  I  show  how  to  apply  type-based,  tag-free  garbage  collection  to  a  Af^-based 
language  called  Xf!L-GC.  It  is  possible  to  give  a  low-level  operational  semantics  for  XfL 
in  the  style  of  Mono-GC,  where  environments  and  the  stack  are  made  explicit.  However, 
the  type  structure  of  XfL  is  considerably  more  complex  than  for  Mono-GC,  and  as  a 
result,  proving  even  relatively  basic  properties,  such  as  type  preservation,  is  considerably 
more  difficult  than  in  the  simply-typed  setting.  Consequently,  I  use  a  somewhat  higher- 
level  semantics  to  describe  evaluation  of  XfL-GC  programs.  This  semantics  is  a  cross 
between  the  contextual  rewriting  semantics  used  in  earlier  chapters  and  the  allocation 
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(kinds) 

k  : 

* —  |  — y  K>2 

(con) 

/'  : 

:=  (u::k) 

(raw  con) 

v  : 

:=  t  u  Arrow(/ii,  fj,2)  Atwnqi  yUi  /i2  Typerec  /i  of  (/i,;/ia) 

(small  con) 

u  : 

:=  l  Int 

(heap  con) 

q  : 

:=  Arrow  (ui,u2)  A  twK.fi 

(con  heap) 

Q  : 

• —  {  ^  1 — Ql-t  ‘  ’  C — Qn  } 

(types) 

a  : 

:=  T(n)  int  \  — )■  a2  Vt.y.K.a 

(norm  types) 

S  : 

:=  int  |  o i  — y  o 2  |  \/t::n.a 

(exp) 

e  : 

b 

P 

II 

(raw  exp) 

r  : 

:=  Xx:a.e  e2  A t::n.e  e  [//]  typerec  n  of  [t.cr](ej;  ea) 

(small  val) 

v  : 

:=  i\l 

(heap  val) 

h  : 

:=  Xx'.a.e  A  twn.e 

(val  heap) 

H  : 

• —  1 — ^1?  *  ?  In — 

(program) 

P  : 

:=  (Q,H,e) 

(answer) 

P  : 

:=  {Q,H,v:q) 

Figure  7.7:  Syntax  of  AfL-GC 

semantics  of  Mono-GC.  Heaps  are  left  explicit,  but  the  stack  and  environments  are 
implicitly  represented  by  evaluation  contexts  and  meta-level  substitution  of  small  values 
for  variables.  The  resulting  system  abstracts  enough  details  that  proofs  are  tractable, 
yet  exposes  the  key  issues  of  tag-free  collection. 

7.5.1  XfL-GC 

The  syntax  of  Xf,L-CC  is  given  in  Figure  7.7.  Programs  consist  of  two  heaps  and  an 
expression.  The  Q  heap  maps  locations  to  constructor  heap  values,  whereas  the  H 
heap  maps  locations  to  expression  heap  values.  In  practice,  one  heap  suffices  for  both 
constructors  and  expressions,  but  making  a  distinction  simplifies  the  static  semantics  for 
the  language.  I  assume  that  the  constructor  heap  of  a  program  contains  no  cycles  (this 
is  reflected  in  the  static  semantics  below),  but  make  no  such  assumption  regarding  the 
expression  heap.  The  expression  of  the  program  can  refer  directly  to  locations  bound  in 
the  heaps,  instead  of  indirecting  through  an  environment.  As  with  Mono-GC,  I  consider 
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programs  to  be  equivalent  up  to  reordering  and  o-conversion  of  locations  bound  in  the 
heap  and  variables  bound  in  constructors  and  expressions. 

Each  raw  constructor  v  is  labelled  with  its  kind,  and  almost  all  raw  expressions 
are  labelled  with  types.  This  information  corresponds  to  the  kind  or  type  information 
that  would  be  present  on  bound  variables  of  a  let-expression  in  an  A-normal  form. 
By  labelling  nested  computations  with  kind  or  type  information  at  compile  time,  we 
effectively  assign  kinds/types  to  any  intermediate  values  allocated  during  computation. 
Constructor  values  and  expression  values  in  the  heaps  are  not  paired  with  summary  kind 
or  type  information;  this  information  is  recovered  during  garbage  collection  from  the 
information  labelling  computations. 

During  garbage  collection,  we  need  to  determine  shape  information  concerning  values 
found  in  computations  from  the  kinds  and  types  labelling  these  values.  Unfortunately, 
it  is  not  always  possible  to  determine  an  expression  value’s  shape  from  its  type.  In 
particular,  if  the  type  is  of  the  form  T(/i)  where  /i  is  some  constructor  computation, 
then  we  must  “run”  the  computation  to  reach  at  least  a  head-normal  form,  either  Int  or 
Arrow(/ii,  yu2),  in  order  to  determine  the  shape  of  the  object2.  But  running  this  construc¬ 
tor  computation  during  garbage  collection  is  problematic  because  the  computation  may 
need  to  allocate  values.  It  seems  as  though  to  free  storage,  we  must  garbage  collect,  but 
to  garbage  collect,  we  might  need  to  allocate  more  storage. 

The  solution  to  this  problem  is  to  “run”  the  necessary  constructors  within  types 
during  evaluation  and  ensure  that  all  needed  types  are  always  in  head-normal  form  when 
garbage  collection  is  invoked.  This  constraint  is  reflected  in  the  syntax  of  Af^-GC  by  the 
fact  that  only  <;  types  (types  in  head-normal  form)  can  label  small  values.  The  evaluation 
rules  for  \f,L-GC  (see  below)  ensure  that  this  constraint  is  maintained  at  all  times.  In 
particular,  small  values  and  their  normalized  type  labels  are  substituted  for  free  variables 
during  evaluation.  In  the  TIL  compiler  (see  Chapter  8),  this  constraint  is  maintained  by 
explicitly  reifying  type  computations  and  by  labelling  variables  of  unknown  shape  with 
a  reified  type.  If  XfL  did  not  have  a  phase  distinction  between  types  and  terms  (i.e.,  if 
types  were  dependent  upon  terms  in  some  fashion),  then  we  could  not  guarantee  that  all 
needed  types  would  be  computed  before  garbage  collection  was  invoked. 

Figure  7.8  gives  the  evaluation  contexts  and  instructions  for  the  constructors,  types, 
and  terms  of  Af^-GC,  and  Figures  7.9,  and  7.10  give  the  rewriting  rules  for  the  language. 
As  in  Mono-GC,  large  values  are  allocated  on  the  heap  and  replaced  with  a  reference  to 
the  appropriate  location.  Unlike  Mono-GC,  I  use  evaluation  contexts  to  determine  the 
next  instruction  instead  of  relying  upon  an  A-normal  form  to  provide  explicit  sequencing 
and  a  stack  to  represent  the  continuation.  Furthermore,  I  use  meta-level  substitution 
of  small  values  for  variables  at  function  application,  instead  of  installing  these  values  in 

2Head-normal  forms  are  not  always  sufficient  to  determine  shape.  For  example,  components  of  pair 
types  must  also  be  normalized  so  that  we  can  determine  which  components  of  the  pair  are  locations. 
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(con  ctxt) 

U 

::=  []  (N  ::  n) 

(raw  ctxt) 

N 

::=  Arrow  (£/,//)  Arrow  (u::k,U)  U  /j  (■ u::n)U 

Typerec  U  of  (//i;:  //„) 

(con  instr) 

J 

::=  Arrow(iti::/ci,  U2"K2)  Xtwn.fi  (u2'.:k2) 

Typerec  (u::k)  of  (//*;  /ia) 

(type  instr) 

I\ 

::=  T{u::k)  \  T{U[J::k}) 

(exp  ctxt) 

E 

::=  []\(R:  a) 

(raw  ctxt) 

R 

::=  Ee  (l:q)  E  \  E[/i\ 

(exp  instr) 

I 

::=  (A x:a.e)  :  K  (A x:a.e)  :  f  (A tr.K.e)  :  q-  {{I'.q)  {v-G)) 

((/:?)  U[J--n])  :  °  [u::k])  :  a 

(typerec  U[J::k\  of  \t.u']{ei,ea))  '■  o  \ 

(typerec  (u::n)  of  [t.a'](ef,ea))  :  o  \ 

Figure  7.8:  XfL-GC  Evaluation  Contexts  and  Instructions 


an  environment.  This  avoids  the  need  to  assume  closure  conversion,  greatly  simplifying 
both  the  dynamic  and  static  semantics  for  the  language. 

Note  that  evaluation  of  A-expressions  at  the  term  level  proceeds  in  two  stages:  first, 
the  type  labelling  the  expression  is  evaluated.  Second,  the  A-expression  is  bound  to  a 
new  location  on  the  heap  and  is  replaced  with  this  location  within  the  expression.  There 
is  no  need  to  evaluate  the  type  labelling  a  A-expression,  since  this  type  must  always 
begin  with  V  and  hence  is  already  in  head-normal  form.  The  rest  of  the  rewriting  rules 
are  fairly  standard  and  reflect  the  left-to-right,  call-by- value  evaluation  strategy  of  the 
language. 

It  is  fairly  easy  to  see  that  evaluation  of  XfL-GC  programs  preserves  the  cycle-freedom 
of  the  constructor  heap  and  that,  given  a  cycle-free  expression  heap,  evaluation  preserves 
cycle-freedom. 

7.5.2  Static  Semantics  of  A^-GC 

In  the  description  of  the  static  semantics  for  Xf1L-GG,  I  use  the  following  meta-variables 
to  range  over  various  sorts  of  assignments,  mapping  variables  or  locations  to  kinds  and 
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1.  (Q,  Arrow(ui::Ki,it2::K2)  ::  K)  1 — *  (Q  ^  {Z=Arrow(rti,  M2)},  Z::re) 

2.  (Q,  (A tr.Ki.fi)  ::  re)  1 — >  (Q  l±l  {/=At::K1./i},  Z::re) 

3.  (Q,  (1::ki)  (u::k2))  i — *  (Q,  {«/£}//)  (Q(l)  =  Xty.K.fi) 

4-  (Q,  (Typerec  (lnt::re')  of  (//$//„))  ::  re)  i — >  (Q,aO 

5-  (Q,  (Typerec  (Z::re')  of  (//<;  /ra))  ::  re)  i — > 

(Q,  {{{{La  "LI  — >  re  — y  re  — >  re) 

(u2’.:Ll)  ::  re  — >  re  — >■  re) 

(Typerec  («i::Q)  of  (//*;  /ia)  ::  re)  ::  re  — y  re) 

(Typerec  (u2::fl)  of  {lu  La)  ■■  re))  "  re)  (Q(Z)  =  Arrow(«i,  «2)) 


(Q,  J::re)  1 — >  (Q',/i) 

'  {Q,U[Jy.k])^{Q?,U[l\) 

7.  (Q,T(lnt::re))  1 — >(Q,  int) 

8.  (Q,T(l::Ll))  y — *  (Q,  T(uivk,i)  ->■  T(u2::re2))  {Q{1)  =  Arrow(ui,  u2)) 

o  _ (Q,  J::re)  1 — >■  (Q',/i) _ 

'  (Q,  5[T(t/[J::re])])  >  (Q\  S[T{U[l])}) 


Figure  7.9:  XfL-GC  Constructor  and  Type  Rewriting  Rules 
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10  _ (Q,K)^(Q',a) _ 

(Q,  H,  (A xia'.e)  :  K)  \ — >  (Q,  H,  (A x:a'.e)  :  a) 

11.  (Q,  H,  (A xr.o.e)  :  c)  1 — >  (Q,  H  1±)  {l=Xx:a.e},  l:q) 

12.  (Q,  H ,  (A tv.K.e)  :  <j)  i — >  (Q,  H  1+1  {l=At::n.e},  l:q) 

13.  (Q,  if,  ((/:?i)  (w:<?))  :  cr)  ' — >  (Q,  H,  {(»:?)/i}e)  (if  (/)  =  Xx:a'.e) 

14  _ (Q,  U[J::k\)  ■  (Q7,  U[fj]) _ 

'  (Q,  if,  ((/:<)  U[J::k])  :  a)  ►  (Q',  if,  ((Z*)  Z7[//] )  :  a) 

15.  (Q,  if,  ((/:ci)  (u::k))  :  a)  i — *  (Q,  if,  {u/t}e)  (H(l)  =  Atr.K.e) 

16  _ (Q,[/[J::K])^(Q/J/[/i]) _ 

(Q,  H,  (typerec  t/[._/::Ac]  of  [l<r'](e*;  ea))  :  a)  i — > 

( Q',H ,  (typerec  [7[/i]  of  ea))  :  a) 

17.  (Q,  if,  (typerec  ( I  nt:  of  [fff'](e,;;  ea))  :  a)  i — >  (Q,  if,  e*) 

18.  (Q,  if,  (typerec  (Z::/c)  of  ea))  :  a)  i — > 

(Q,H,  ((((((ea  :  Vt2"^-{^i/i}cr/  ->■  (WA0"'  -»  cr) 

(u2::Q))  :  {r/i/ilcr7  — >  {^/fjcr'  — *  a 

((typerec  (u^.-.Q)  of  [t.<T'](ej;  ea))  :  {ux/t}^)  :  {^/fjcr'  ->■  cr)) 

((typerec  (u2::fl)  of  [t.a'](ei]  ea))  :  {u2/t}a'))  :  a)  (Q(/)  =  Arrow(«i,  u2)) 

•  (Q,H,E[I:a])^(Q',H',E[e]) 


Figure  7.10:  XfL-GC  Expression  Rewriting  Rules 
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types: 

(variable  kind  assignment)  A  ::=  -  •  •,  tn::nn} 

(variable  type  assignment)  F  ::=  {xp./Ji,- •  •  ,xn:<jn} 

(location  kind  assignment)  FI  ::=  {li.'.Ki,  •  •  • ,  ln::nn} 

(location  type  assignment)  4/  ::=  {h'.Sy,  •  •  • , 

In  addition  I  use  <F  to  range  over  maps  from  locations  to  both  kinds  and  heap  constructor 
values: 

$  ::=  {li::Ki=qu- ■  ■  ,ln::Kn=qn} 

Given  an  assignment  4>,  I  use  Q, $  and  11$  to  represent  the  heap  and  location  kind  assign¬ 
ment  implicit  in  <F. 

The  formation  judgments  for  XfL-GC  are  as  follows: 

1.  II;  A  b  v  ::  k  constructor  formation 

2.  II  b  q  ::  k  constructor  heap  value  formation 

3.  n  b  Q  ::  IF  constructor  heap  formation 

4.  nb$  kind  and  constructor  assignment  formation 

■5.  II;  A  b  a  type  formation 

6.  Ft  b  F  location  type  assignment  formation 

7.  II;  A  b  F  variable  type  assignment  formation 

8.  <f>;  4/;  A;  T  b  e  :  a  expression  formation 

9.  <I>:  T  I  h  :  a  heap  value  formation 

10.  $;  4/'  b  H  :  heap  formation 

11.  b  (Q,H,e)  :  a  program  formation 

The  axioms  and  inference  rules  that  allow  us  to  derive  these  judgments  are  largely  stan¬ 
dard,  so  I  will  only  provide  a  high-level  overview.  For  judgments  1-7,  II  tracks  the  kind  of 
all  free  locations  in  the  given  constructor,  constructor  heap  value,  constructor  heap,  type, 
or  assignment.  For  constructors  and  types,  A  tracks  the  kind  of  free  type  variables.  The 
lack  of  A  in  the  judgments  3-4  indicates  that  there  can  be  no  free  type  variables,  only 
free  locations  within  constructor  heaps.  Constructor  heap  formation  consists  of  an  axiom 
and  an  inference  rule  that  allow  us  to  construct  heaps  inductively,  thereby  ensuring  the 
heap  has  no  cycles: 

,  ,  T  b  Q  ::  n  n'wnbg::K 

II  b  0  :  0  — - - - - - 

IF  bQW  {l=q}  ::  lib  {1::k} 

A  kind  and  constructor  assignment  is  well-formed  with  respect  to  II  if  II  b  Q$  ::  11$. 

Judgments  8-9  require  more  information  than  simply  the  kinds  of  free  locations.  This 
is  because  we  must  treat  constructor  locations  as  transparent  type  variables  in  order  to 
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recover  the  same  definitional  equivalence  that  we  have  in  a  totally  substitution-based 
semantics.  For  example,  if  li  is  bound  to  Arrow(«i,  u2),  then  we  must  consider  1 1  to 
be  equivalent  to  Arrow(ui,  u^)-  Therefore,  equivalence  of  both  constructors  and  types  is 
given  with  respect  to  a  kind  and  constructor  assignment  <f>: 

12.  $;Ah  /i  =  ji'  ::  k  constructor  equivalence 

13.  $;  A  h  a  =  a'  type  equivalence 

The  axioms  and  inference  rules  that  allow  us  to  conclude  two  constructors  or  types  are 
equivalent  are  standard  (see  Chapter  3)  with  the  addition  of  two  axioms  governing  the 
transparency  of  locations: 

<f>  l+l  {7::fi=Arrow(«i,  112)}]  A  h  l::Q  =  (Arrow(«i::fi,  U2::0)::D)  :: 

$  l+l  {l::n=Xt::n'  ./a};  A  h  1::k  =  ((A t::n  ::  n 

I  claim  that  orienting  these  equivalences  to  the  right  yields  a  reduction  system  for  con¬ 
structors  (and  hence  types)  that  is  both  locally  confluent  and  strongly  normalizing  if  <E> 
has  no  cycles,  which  is  true  when  $  is  well-formed.  It  should  be  fairly  straightforward  to 
extend  the  proofs  of  Chapter  4  to  show  that  this  claim  is  true. 

Judgment  11,  program  formation,  is  determined  by  the  following  rule: 

0  h  $ 

T;  T;  0:  0  h  e  :  a 
h  (Q$,  H,  e)  :  a 

The  rule  requires  that  the  constructor  heap  be  well-formed  and  described  by  $,  that  the 
expression  heap  be  well-formed  and  described  by  T  under  the  assumptions  of  <I>,  and  that 
the  expression  be  well-formed  with  type  a  under  the  assumptions  <I>  and  T.  Note  that 
all  components  must  be  closed  with  respect  to  type  and  value  variables. 

Given  unique  normal  forms  for  types,  it  is  straightforward  to  define  a  suitable  notion  of 
normal  derivation  for  XfL-GC  programs  and  hence  show  that  type  checking  is  decidable. 
With  the  normal  derivations  in  hand,  it  should  be  fairly  straightforward  to  prove  both 
preservation  and  progress,  as  in  Chapter  4. 

Proposition  7.5.1  (Preservation)  If  h  ( Q,H,e )  :  a  and  (Q,H,e)  1 — s>  (Q1,  H' .  G), 
then  h  ( Q H':  e ')  :  a. 

Proposition  7.5.2  (Progress)  If  h  P  :  a,  then  either  P  is  an  answer  or  else  there 
exists  a  P'  such  that  P  \ — )►  P' . 
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7.5.3  Garbage  Collection  and  A^-GC 

The  definitions  of  garbage  and  garbage  collection  for  Xf!L-GC  are  essentially  the  same  as 
for  Mono-GC,  except  that  I  want  to  eliminate  garbage  bindings  in  both  the  constructor 
and  the  expression  heaps.  Assuming  FL  calculates  all  of  the  free  locations  of  a  program, 
then  the  tracing  collection  specification  for  Af^-GC  is  simply: 


FL(Q1,  H%i  e)  =  0 


(Q i  W  Q 2,  Hi  l±l  H2,  e) 


trace  TT  •, 
i — >  (Qi,  Hi,  e) 


It  is  easy  to  prove  the  diamond  and  postponement  lemmas  hold  with  respect  to  this 
trace  step  and  any  other  rewriting  rule  of  XfL-GC.  From  this,  we  can  conclude  that  the 
tracing  collection  specification  is  a  sound  garbage  collection  rule. 

In  the  rest  of  this  section,  I  will  present  a  type-based,  tag-free  collection  algorithm 
and  give  a  proof  sketch  that  it  is  sound  by  showing  that  any  garbage  it  collects  can  also 
be  collected  by  the  tracing  specification. 

Before  giving  the  collection  algorithm,  I  need  to  define  some  auxiliary  functions  that 
extract  the  range  of  the  constructor  and  value  environments  of  an  abstraction.  Since 
XfL-GC  uses  meta-level  substitution,  these  environments  are  not  immediately  apparent. 
Therefore,  these  auxiliary  functions  must  deconstruct  terms  based  on  abstract  syntax 
to  find  these  values.  In  a  lower-level  model,  as  with  Mono-GC,  where  environments  are 
explicit,  no  such  processing  of  terms  is  required. 

The  ConEnv  function  maps  a  constructor  p  to  a  location  kind  assignment  II,  by 
extracting  all  of  the  locations  (and  their  kinds)  within  the  constructor: 


ConEnv(t  ::  k) 

ConEnv(lnt ::  k) 

ConEnv(/  ::  k) 

ConEnv(Arrow(/ii, /i2)  ::  k) 
ConEnv((A t:\Ki. n)  ::  k) 
ConEnv((Typerec  /q  of  (/q;  //„))  ::  k) 


0 

0 

ConEnv(/ii)  U  ConEnv(/i2) 

ConEnv(/i) 

ConEnv(/i)  U  ConEnv(/i,;)  U  ConEnv(//a) 


The  TypeEnv  function  maps  a  type  cr  to  a  location  kind  assignment  II: 


TypeEnv(T(/v,)) 
TypeEnv(int) 
TypeEnv(<7i  — >  cr2) 
TypeEnv(Vt::/i.(T) 


ConEnv(/i) 

0 

TypeEnv(ai)  U  TypeEnv(<r2) 
TypeEnv(a) 


Finally,  the  ExpEnv  function  maps  an  expression  e  to  both  a  location  kind  assignment 
II  and  a  location  type  assignment  H/.  In  the  definition  of  the  function,  I  use  (IIi,\I,i)  U 
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(n2,  4/2)  to  abbreviate  (IIi  U  n2,  4>i  U  \I>2) . 

ExpEnv(;r)  =  (0,0) 

ExpEnv(u:int)  =  (0,0) 

ExpEnv(/:cri  — >■  a2)  =  (TypeEnv(cr1  — >  a2),  {La  1  — >■  a2}) 

ExpEnv(/:Vb:/i.a)  =  (TypeEnv(a),  {l:\/t::K.a}) 

ExpEnv(( Xx:a'. e)  :  a)  =  (TypeEnv(a)  U  TypeEnv(a'),  0)  U  ExpEnv(e) 

ExpEnv((ei  e2)  :  a)  =  (TypeEnv(a),  0)  U  ExpEnv(ei)  U  ExpEnv(e2) 

ExpEnv((Ai::/c.e)  :  a)  =  (TypeEnv(a),  0)  U  ExpEnv(e) 

ExpEnv((ex  [//])  :  a)  =  (TypeEnv(a)  U  ConEnv(/i),  0)  U  ExpEnv(e) 

ExpEnv((typerec  //  of  \t.a'](ei,ea))  :  a)  = 

(TypeEnv(a)  U  TypeEnv(a')  U  ConEnv(/u),  0)  U  ExpEnv(ej)  U  ExpEnv(ea) 

Note  that,  once  a  small  value  is  reached,  we  can  use  type  information  to  determine  the 
shape  of  the  value.  For  instance,  when  processing  a  small  value  v  labelled  with  int, 
we  know  that  v  must  be  an  integer  i.  Hence,  we  continue  to  use  type  information  to 
determine  the  shape  of  small  values  and  heap  values,  iust  as  I  did  in  the  tag-free  collection 
of  Mono-GC. 

The  following  lemma  shows  that  the  ConEnv  and  TypeEnv  functions  extract  appro¬ 
priate  kind  assignments  from  well-formed  constructors  and  types. 

Lemma  7.5.3 

1.  If  n;  A  b  (v  ::  k),  then  ConEnv^  ::  k)  =  n7  for  some  n7  and  ff  C  n  and  nr;  A  h 
{v  ::  k). 

2.  7/n;  A  h  a,  then  TypeEnv(cr)  =  n'  for  some  n'  and  F  cn  and  n';  Aha. 

Proof  (sketch):  Simple  induction  on  v  and  a,  using  the  syntax-directedness  of  the 

formation  rules.  The  fact  that  n'  C  n  ensures  that  the  inductive  hypotheses  can  be 
unioned  to  form  a  consistent  kind  assignment.  □ 

The  next  lemma  shows  that  we  can  strengthen  the  assumptions  regarding  the  equiv¬ 
alence  of  two  constructors  or  two  types,  as  long  as  the  strengthened  assumptions  are 
closed  and  cover  the  free  locations  of  the  constructors  or  types. 

Lemma  7.5.4 

1.  If  $;  A  h  // 1  =  ji2  ::  k,  $'  C  $,  0  h  n,x,/:A  h  /j,1,  and  n$/;A  h  /i2,  then 
$r:  A  h  // 1  =  /i2  ::  n. 

2.  //'  <h:  A  h  (7\  =  a2,  C  <f>,  0  h  11$/;  A  b  alt  and  n,()y  A  h  a2,  then  A  h  a1  = 
a2  ::  k. 
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Proof  (sketch):  By  induction  on  the  derivation  of  <f>;  A  h  ji i  =  fi2  ::  k  and  <f>;  A  h  cy  = 
a2.  □ 

Next,  I  argue  that  if  e  has  type  a  under  <f>;  T;  A:  Y.  then  ExpEnv(e)  exists  and  is  some 
(IT,  T')  “consistent”  with  <f>  and  T.  Here,  consistent  means  that,  if  we  start  with  a  subset 
of  <f>  containing  all  of  the  locations  bound  in  n',  extend  this  subset  so  that  it  is  closed 
and  covers  all  of  the  free  locations  in  F  and  a  yielding  then  T';  A;  T  is  sufficient 
to  show  that  e  has  type  o. 

Lemma  7.5.5  If  <h:  $;A;fh  e  :  a,  then  ExpEnv(e)  =  (n',  T')  and: 

1.  n7  c  n$, 

2.  c  t, 

3.  If  C  $,  such  that  Dom(YY)  C  Dom{<Y'),  n,j,  h  dr',  n,j,;A  h  F,  n$;  A  h  a,  and 
h  then  fjAffihe:  a. 

Proof  (sketch):  The  proof  proceeds  by  induction  on  e  and  relies  upon  the  fact  that  each 
typing  derivation  has  a  normal  form  that  interleaves  non-equiv  rules  with  equiv  rules. 
The  preconditions  of  part  3  coupled  with  the  previous  lemma  are  sufficient  to  show  that, 
for  every  application  of  the  equiv  step,  the  two  types  in  question  are  equivalent  under 
the  strengthened  assumptions.  Parts  1  and  2  are  needed  to  show  that  the  union  of  the 
assumptions  for  the  inductive  hypotheses  are  well-formed  contexts.  □ 

Using  ConEnv,  TypeEnv,  and  ExpEnv,  I  can  now  define  functions  to  extract  the  loca¬ 
tions  and  their  types  or  kinds  from  heap  values  based  on  kinds  or  types.  The  definitions 
of  these  functions  are  similar  to  the  various  TL  functions  of  Mono-GC,  given  in  Section 
7.3.  The  function  KL  is  a  partial  function  that  takes  a  kind  k  and  a  constructor  heap 
value  q  and  returns  the  set  of  free  locations  in  q  as  well  as  their  kinds. 

KL[Q] (Arrow(u!,  u2))  =  ConEnv(uiAl)  U  ConEnv(u2::Q) 

KL[k\  — >  n2]( Xt: \Kiqx)  =  ConEnv(At::/s;i./i) 

Similarly,  the  function  TL  is  a  partial  function  that  takes  a  head-normal  type  and  a 
heap  value  h  and  returns  the  set  of  free  locations  in  h  as  well  as  their  kinds  or  types. 

TL[c y  — >■  a2]{\x\a’ .e)  =  ExpEnv^Aaxa'.e)  :  cy  — >  a2) 
TL\s/t\\K.a](At\\K.e )  =  ExpEnv((A£: :K.e)  :  Mtwu.cr) 

As  for  Mono-GC,  I  express  garbage  collection  as  a  set  of  rewriting  rules  between  tuples. 
I  begin  by  defining  a  rewriting  system  that  only  operates  on  constructor  heaps.  For  this 
system,  tuples  are  of  the  form  (Q/,n.s,Qt)  where  Qj  is  the  constructor  from-space,  ns 
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is  the  scan-set  describing  the  kinds  of  all  constructors  in  the  from-space  reachable  from 
the  to-space,  and  Qt  is  the  to-space.  The  rewriting  rule  for  constructors  is  simply: 

( Qf  W  {l=q},  n5  Id  Qt)  =v  (Qf,  ns  U  n'a,  Qt  d  {l=q}) 

where  n'  =  {I'wk!  e  KL[n\(q)  \  l'  Dom(Qt )  d  {/}} 

If  a  location  l,  described  by  k  is  in  the  scan-set  and  l  is  bound  to  the  heap  value  q  in  the 
from-space,  then  we  forward  the  binding  l=q  to  the  to-space  and  add  any  free  location 
in  q  and  its  kind  to  the  scan-set,  unless  the  location  has  already  been  forwarded  to  the 
to-space. 

The  following  definition  gives  the  essential  invariants  of  the  constructor  garbage  col¬ 
lection  rewriting  system: 

Definition  7.5.6  (Constructor  GC  Well-Formedness)  (<5/,ns  ,  Qt)  is  well-formed, 
with  respect  to  $  iff: 

1.  0  b  Qj  d  Qt  ::  T, 

2.  ns  c  n$, 

3.  Dom{ n.s)  C  Dom(Qf), 

4-  ns  b  Qt  ::  lit,  where  Ilf  =  {l::n  €  11$  |  l  G  Dom(Qt)}. 

It  is  straightforward  to  prove  that  constructor  well-formedness  is  preserved  by  the  rewrit¬ 
ing  system  and  that  progress  is  always  possible  for  well-formed  tuples. 

Lemma  7.5.7  (Constructor  GC  Preservation)  If  T  is  well-formed  with  respect  to 
$  and  T  =^>  T' ,  then  T'  is  well-formed  with  respect  to  <E>. 

Proof  (sketch):  Follows  from  the  invariants  and  lemma  7.5.3  (1).  The  argument  is 

similar  to  the  preservation  argument  for  Mono-GC  (see  lemma  7.3.3).  □ 

Lemma  7.5.8  (Constructor  GC  Progress)  If  T  =  {Q f,Hs,Qt)  is  well-formed  with 
respect  to  $,  then  either  IIS  is  empty  or  else  there  exists  a  T'  such  that  T  =>  T' . 

Proof:  Suppose  IIS  =  n(  d  {l::n}.  By  the  second  condition  of  well-formedness,  we 

know  that  11$ (l)  =  k.  By  the  third  condition,  we  know  that  Qf  =  Q'j  d  {l=q}  for 
some  Q'j  and  q.  By  the  first  condition,  we  know  that  11$  b  q::n.  Hence,  KL[n\{q) 
is  defined  and  by  lemma  7.5.3,  KL[n\(q)  C  n$.  Thus,  n(  U  n"  is  well-formed  where 
n"  =  \1'::k'  e  KL[n\(q)  \  V  <£  Dom(Qt)C{l}}.  Therefore,  T  =»  {Qf,  n(  un",  Qtd {l=q}). 

□ 
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Since  the  size  of  the  from-space  always  decreases  with  each  step,  it  is  easy  to  see  that 
constructor  garbage  collection  always  terminates.  The  progress  lemma  tells  us  that  the 
collection  never  gets  stuck  and  the  preservation  lemma  tells  us  that  at  every  step,  the 
resulting  tuple  is  well-formed  with  respect  to  the  given  $. 

The  following  lemma  shows  that  constructor  garbage  collection  is  locally  confluent. 
With  the  fact  that  constructor  garbage  collection  always  terminates,  this  implies  that 
constructor  collection  is  confluent.  This  tells  us  that,  no  matter  what  order  we  process 
the  constructors,  we  always  get  the  same  to-space  at  the  end  of  a  collection. 

Lemma  7.5.9  (Constructor  GC  Local  Confluence)  If  T  is  well-formed  with  re¬ 
spect  to  <f>,  T  =>-  Xi  and  T  =>  T2,  then  there  exists  a  T'  such  that  T\  =>-  V  and  T2  =>-  T' . 

Proof:  Suppose  T  =  (Qf  t+J  {l1=q1,l2=q2},  ns  1+)  {hr.Ki,  Qt),  T\  =  (Qf  l±) 

{^2=^2},  nsl±l{/2::/i2}urii,  QtC{l2=q2})  where  fix  =  {I'v.G  £  KL[ni]{qi)  \  l'  <fL  Do7n(Qt) t+J 
{/1}},  and  X2  =  (Qf  l±l  {Z1=Q'1},  n@  l±l  U  H2,Qt  W  {^2=^2})  where  fl2  =  £ 

KL[n2](q2)  |  V  £  Dom(Qt)  I±1  {/2}}.  Let  T  =  (Qf,Hs  U  IT,  U  H2,Qf  W  {h=  lqi,l2=q2}), 
where  n'  =  {I'wtC  G  fix  |  V  ^  l2}  and  n'2  =  {I'wk!  £  fl2  |  l'  ^  /] }.  Then  it  is  easy  to  see 
that  both  T\  =>  V  and  X2  =>  V .  □ 

Lemma  7.5.10  If  (Qf,  fls,  Qt)  and  (Qf,  n.s  U  II' ,  Qt)  are  well-formed  with  respect  to  <f>, 
then  (Qf,  IIS,  Qt)  => *  (Q'f,  0,  Q't),  (Qf,  ns  U  n'a,  Qt)  => *  (Q",  0,  Q”),  and  Q't  C  Q'f. 

For  expressions,  the  garbage  collection  rewriting  rules  operate  on  6-tuples  of  the 
form  (Qf,  Hf,  IIS,  fiN,  Qt,  Ht),  where  Qf  and  Hf  are  the  constructor  and  expression  from- 
spaces,  II.5  describes  the  constructors  immediately  reachable  from  Qt,  Ht,  and  ^s,  and 
fl/,5  describes  the  heap  values  immediately  reachable  from  II f.  There  are  two  rewriting 
rules  at  this  level.  The  first  rule  simply  uses  the  constructor  rewriting  rule  to  process  a 
constructor  binding: 


_ (Q/-,  ns,  Qt)  =>  (Q'f,  n' ,  Q'f) _ 

(Qf,  Hf,  n.s,  $.5,  Qt,  Ht)  =>■  (Q'f,  Hf,  II',  Ts,  Q't,  Ht) 

The  second  rule  processes  a  binding  in  the  expression  heap: 


(Qf,  Hf  l±l  {l=h},  n5,  W  {l:q},  Qt ,  Ht)  =>  (Qf,  Hf,  ns  U  n'a,  U  T's,  Qt,  Ht  l±l  {l=h}) 


where  (II",  T")  =  TL[q\(h) 

n'a  =  {I'v.k'  £  II"  |  V  <£  Dom(Qt)} 

=  {l':C  £  T"  I  l'  &  Dom(Ht)  W  {/}} 


Note  that  both  of  the  scan  sets  are  updated  when  an  expression  heap  value  is  processed. 
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The  initialization  and  finalization  steps  for  the  full  garbage  collection  are  captured 
by  the  following  inference  rule: 


ExpEnv(e)  =  (IT,  $) 

(Q,  H,  n,  T,  0,  0)  (Qf,  Hf,  0,  0,  Qu  Ht) 

(Q.  U.e)  (Qt,Ht,e) 

We  initialize  the  system  by  extracting  the  locations  and  their  kinds  or  types  from  the 
range  of  the  environment  of  the  current  expression.  This  corresponds  to  extracting  the 
root  locations  from  the  stack  and  registers  of  a  real  implementation.  Then,  we  continue 
choosing  locations  in  one  of  the  scan  sets,  forward  this  location  from  the  appropriate 
from-space  to  the  appropriate  to-space,  potentially  adding  new  locations  to  the  scan- 
sets.  Once  the  scan-sets  become  empty,  the  algorithm  is  finished,  and  the  to-spaces  are 
taken  as  the  new,  garbage  collected  heaps  of  the  program. 

To  prove  the  correctness  of  the  algorithm,  I  must  establish  a  suitable  set  of  invari¬ 
ants  that  guarantees  that  (a)  the  algorithm  does  not  become  stuck  and  (b)  the  resulting 
program  is  closed.  The  key  difficulty  in  establishing  the  invariants  is  that  we  must  concep¬ 
tually  complete  the  constructor  garbage  collection  to  ensure  that  enough  constructors  are 
present  that  we  can  derive  all  needed  equivalences  to  type-check  the  heap  and  expression 
of  the  program. 

Definition  7.5.11  (Well-Formedness)  Suppose  P  =  ( Q,H,e )  where  0  h  <h,  <E>;  0  b 
H  :  \k,  and  <f>;  T;  0:  0  h  e  :  a.  The  tuple  T  =  (Qf,  Hj ,  IIS,  \k.s,  Qu  Ht)  is  well-formed  with 
respect  to  P,  <E>,  T,  and  a  iff: 

1.  (Qf,ns,Qt)  is  well-formed  with  respect  to  <f>  and  thus,  for  some  <&'  C  <f>, 

(Q/,ns,Qt)=>*(0/,  0,0*0, 

2.  H  =  HfCHt, 

3.  $.5  C  \k  and  Dom(\ l/s)  C  Dom(Hf), 

4-  'I s  h  Ht  :  \kt  where  =  {l:q  G  T  |  /  G  Dom(Ht)} ,  and 
5.  <1>';  T,  a^;0;0he:<r. 

Roughly  speaking,  the  invariants  guarantee  that  (1)  constructor  garbage  collection  can 
proceed  to  some  appropriate  final  state,  (2)  all  of  the  values  in  the  expression  heap  are 
accounted  for,  (3)  the  scan-set  is  consistent  with  the  global  location  type  assignment 
\k  and  describes  a  frontier  of  locations  bound  in  the  from-space,  (4)  after  constructor 
collection  is  complete,  the  resulting  constructor  to-space  and  expression  scan-set  cover 


CHAPTER  7.  TYPES  AND  GARBAGE  COLLECTION 


157 


the  free  locations  in  the  to-space  and  (5)  all  of  the  free  locations  in  e  and  a  are  covered 
by  the  to-spaces  or  scan-sets. 

The  following  lemma  shows  that  the  invariants  are  strong  enough  to  guarantee  that, 
at  the  end  of  rewriting,  the  resulting  program  is  a  collection  of  the  original  program. 

Lemma  7.5.12  Let  P  =  ( Q ,  H ,  e)  where  0  b  fi>,  fi>;  0  b  H  :  fit,  and  fi>;  fit;  0;  0  b  e  :  a,  and 
suppose  T  =  (Q f,  Hf  ,0,0,  Qt,  Ht)  is  well-formed  with  respect  to  P,  fi>,  fit,  and  a.  Then 
(Qt,Ht,e)  is  a  collection  of  P. 

Proof:  From  the  correctness  of  the  tracing  collection  specification,  it  suffices  to  show 

that  (Qt,  Ht ,  e)  is  closed.  Since  T  is  well-formed,  we  know  from  the  first  invariant  that 
(Q/,0,  Qt)  is  well-formed  with  respect  to  fib  Thus,  taking  fib  C  fi>  such  that  Dom(<&')  = 
Dom(Qt ),  we  know  that  0  b  and  thus  0  b  fi>'.  From  the  fourth  invariant,  we  know 

that  fi>';  0  b  Hp^t  where  fi/t  =  {l:q  G  fit  |  /  G  Dom(Ht)}  and  from  the  fifth  invariant, 
fib;  fifi;  0;  0  b  e  :  a.  Consequently,  b  ( Qt,Ht,e )  :  a  and  ( Qt,Ht,e )  is  closed.  Therefore, 

P  th^ie  (Qt,  Ht,  e )  and  thus  P  ~  (Qt,  Ht,  e).  □ 

Since  either  the  size  of  the  constructor  from-space  or  the  expression  from-space  strictly 
decreases  at  each  step,  it  is  clear  that  the  rewriting  system  either  terminates  or  gets  stuck. 

Lemma  7.5.13  (Preservation)  If  T  is  well-formed  with  respect  to  P,  fit,  fit,  and  a, 
and  T  =>  T' ,  then  T'  is  well-formed  with  respect  to  P,  fi>,  fit,  and  a. 

Proof:  In  the  first  case,  T  =4>  T'  via  the  constructor  garbage  collection  rule.  Well- 

formedness  of  T'  is  guaranteed  by  preservation  of  the  constructor  GC  invariants,  with 
confluence  of  constructor  GC. 

In  the  second  case,  T  =  (Qf,  Hf\£{l=h},  IIS,  fi/,b{/:<;},  Qt,  Ht)  and  T'  =  (Qf,Hf,  IISU 
n's,^U  %,QuHtC  {l=h}),  where  TL[<f](h)  =  (II",  fiC),  n's  =  G  II"  |  V  £ 

Dom(Qt)},  and  fit'  =  G  fit"  |  l'  qL  Dom(Ht)  l±)  {l}}-  Lemma  7.5.5  and  the  definition 
of  TL  tells  us  that  II"  C  II,  and  C  fit,.  Thus  11,  U  II'  is  a  well-formed  location 
kind  assignment  that  is  a  subset  of  II.  Thus,  invariant  (1),  (Q/,1!,  UlTs,Qf)  is  satisfied. 
Invariant  (2)  is  trivially  satisfied,  since,  by  assumption,  H  =  Hj  ttl  Ht\£  { l=h }.  Invariant 
(3)  is  satisfied  if  fib  c  fi/  and  Pom  (fid)  C  Dom(Hj).  The  former  condition  holds  since 
fit"  C  fit  and  the  latter  condition  holds  by  construction  of  fit'. 

By  invariant  (1),  (Q/,n,,Qt)  =4>*  ( Q/,0,  Qt )  for  some  Q'j  and  Q\,  where  fi>'  C  fi> 
and  Q$/  =  Q(.  Furthermore,  (Qj,  II,  U  tI's,Qt)  =4>*  (Q",0,  Q")  and  taking  fi>"  C  fi> 
such  that  Q<$,n  =  Q"t,  and  by  lemma  7.5.10,  we  know  that  fi>'  C  fi>".  By  invariant  (3), 
fi>';  fi /,  l±l  {/:<;}  b  Ht  :  fifi.  Since  fib  C  fi>",  fit";  fit,  1+)  b  Ht  :  fifi.  By  lemma  7.5.5,  we 
know  that  fi>";  (fi/,  b  fib)  U  0;  0  b  h  :  c.  Thus,  $";  fib  U  fid  b  Ht  b  {l=h}  :  b  {Z:^} 
and  invariant  (4)  is  satisfied. 

Finally,  by  invariant  (5),  fi>';  fib  b  fi >t  b  {/:?};  0;  0  b  e  :  a.  Thus,  fit";  (fit,  b  fifi  b  b 
n's;  0;  0  b  e  :  a  and  invariant  (5)  is  satisfied.  □ 
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Lemma  7.5.14  (Progress)  If  T  =  (Qf,  Ht)  is  well-formed  with  respect 

to  P,  <L,  and  a,  then  either  Ils  and  Ts  are  empty  or  else  there  exists  a  T'  such  that 
T  =»  T . 

Proof:  If  ns  is  non-empty,  then  progress  is  guaranteed  by  constructor  GC  progress. 

If  the  expression  scan-set  is  non  empty,  then  it  is  of  the  form  l±l  { l By  invariant 
(3),  E  4/  and  l  must  be  bound  in  the  from  space.  Thus,  assume  the  from-space  is 
of  the  form  Hf  l±l  {l=h}  for  some  Hf  and  h.  By  lemma  7.5.5  and  the  definition  of  TL, 
we  know  that  TL[<;\(h)  is  defined  and  is  some  (II",  T")  such  that  II"  C  II  and  T"  C  T. 
Therefore,  taking  ITS  =  {I'wk!  E  II"  |  l1  ^  Dom(Qt )}  and  =  {/kC  £  T"  |  l1  <£ 
Dom{Ht )  l+l  {Z} } ,  we  know  that  IIS  U  ITS,  is  well-formed  and  dp  U  is  well-formed.  Thus, 

T  =»  (Qf,Hf,  nsun;,$s  U  W  {l=h}).  □ 

Corollary  7.5.15  (Tracing  Algorithm  Correctness)  If  P  is  well-typed,  then  there 

tr_  cilsr 

exists  a  P'  such  that  P  \ — -T  P'  and  P  ~  P1 . 

Proof:  Follows  immediately  from  lemma  7.5.12,  Preservation,  and  Progress.  □ 

7.6  Related  Work 

The  literature  on  garbage  collection  in  sequential  programming  languages  per  se  contains 
few  papers  that  attempt  to  provide  a  compact  characterization  of  algorithms  or  correct¬ 
ness  proofs.  Demers  et  al.  [34]  give  a  model  of  memory  parameterized  by  an  abstract 
notion  of  a  “points-to”  relation.  As  a  result,  they  can  characterize  reachability- based 
algorithms  including  mark-sweep,  copying,  generational,  “conservative,”  and  other  so¬ 
phisticated  forms  of  garbage  collection.  However,  their  model  is  intentionally  divorced 
from  the  programming  language  and  cannot  take  advantage  of  any  semantic  properties 
of  evaluation,  such  as  type  preservation.  Consequently,  their  framework  cannot  model 
the  type-based  collectors  I  describe  here.  Nettles  [98]  provides  a  concrete  specification 
of  a  copying  garbage  collection  algorithm  using  the  Larch  specification  language.  My 
specification  of  the  free-variable  tracing  algorithm  is  essentially  a  high-level,  one-line 
description  of  his  specification. 

Hudak  gives  a  denotational  model  that  tracks  reference  counts  for  a  first-order  lan¬ 
guage  [67].  He  presents  an  abstraction  of  the  model  and  gives  an  algorithm  for  computing 
approximations  of  reference  counts  statically.  Chirimar,  Gunter,  and  Riecke  give  a  frame¬ 
work  for  proving  invariants  regarding  memory  management  for  a  language  with  a  linear 
type  system  [30].  Their  low-level  semantics  specifies  explicit  memory  management  based 
on  reference  counting.  Both  Hudak  and  Chirimar  et  al.  assume  a  weak  approximation 
of  garbage  (reference  counts).  Barendsen  and  Smetsers  give  a  Currv-like  type  system  for 
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functional  languages  extended  with  uniqueness  information  that  guarantees  an  object  is 
only  “locally  accessible”  [16].  This  provides  a  compiler  enough  information  to  determine 
when  certain  objects  may  be  garbage  collected  or  over-written. 

Tolmach  [119]  built  a  type-recovery  collector  for  a  variant  of  SML  that  passes  type 
information  to  polymorphic  routines  during  execution.  Aditya  and  Caro  gave  a  type- 
recovery  algorithm  for  an  implementation  of  Id  that  is  equivalent  to  type  passing  [5]  and 
Aditya,  Flood,  and  Hicks  extended  this  work  to  garbage  collection  for  Id  [6].  In  both 
collectors,  bindings  for  type  variables  are  accumulated  in  type  environments  as  I  propose 
here. 

However,  the  type  systems  of  these  languages  are  considerably  simpler  that  XfL.  In 
particular,  they  only  support  instantiation  of  polytypes  and  not  general  forms  of  compu¬ 
tation  (e.g.,  function  call  and  Typerec).  Furthermore,  neither  of  these  implementations 
allowed  terms  to  examine  types  for  operations,  such  as  polymorphic  equality  or  dy¬ 
namic  argument  flattening.  Tolmach  took  advantage  of  these  properties  by  delaying  the 
computation  of  a  type  instantiation  until  this  instantiation  was  needed  during  garbage 
collection.  In  essence,  he  represented  types  as  closures  -  a  pair  consisting  of  a  type 
environment  and  a  term  with  free  variables  whose  bindings  could  be  found  in  the  envi¬ 
ronment.  His  “lazy”  strategy  for  type  instantiation  avoided  constructing  types  that  are 
unneeded  outside  garbage  collection. 

In  contrast,  for  languages  like  Xf41,  I  propose  computing  type  information  eagerly  to 
ensure  that  no  computation,  and  thus  no  allocation  occurs  during  garbage  collection.  The 
TIL  compiler  uses  a  hybrid  tag-free  scheme  to  avoid  constructing  types  for  all  values.  TIL 
also  performs  various  optimizations  to  share  as  many  type  computations  as  is  possible. 
These  and  other  “real- world”  implementation  issues  are  discussed  in  Chapter  8. 

Over  the  past  few  years,  a  number  of  papers  on  inference-based  collection  in  monomor- 
phic  [22,  129,  23]  and  polymorphic  [8,  49,  50,  43]  languages  appeared  in  the  literature. 
Appel  [8]  argued  informally  that  “tag-free”  collection  is  possible  for  polymorphic  lan¬ 
guages  such  as  SML  by  a  combination  of  recording  information  statically  and  performing 
what  amounts  to  type  inference  during  the  collection  process,  though  the  connections 
between  inference  and  collection  were  not  made  clear.  Baker  [14]  recognized  that  Milner- 
style  type  inference  can  be  used  to  prove  that  reachable  objects  can  be  safely  collected, 
but  did  not  give  a  formal  account  of  this  result.  Goldberg  and  Gloger  [50]  recognized 
that  it  is  not  possible  to  reconstruct  the  concrete  types  of  all  reachable  values  in  an  im¬ 
plementation  of  an  ML-style  language  that  does  not  pass  types  to  polymorphic  routines. 
They  gave  an  informal  argument  based  on  traversal  of  stack  frames  to  show  that  such 
values  are  semantically  garbage.  Fradet  [43]  gave  another  argument  based  on  Reynolds’s 
abstraction/parametricity  theorem  [104], 

The  style  of  semantics  I  use  here  is  closely  related  to  the  allocation  semantics  used  in 
my  previous  work  on  garbage  collection  [96,  95],  but  is  slightly  lower-level.  In  particular, 
I  use  closures  and  environments  to  implement  substitution.  In  this  respect,  the  semantics 
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is  quite  similar  to  the  SECD  [80]  and  CEIv  machines  [40].  The  primary  difference  between 
my  approach  and  these  machines  is  that  I  make  the  heap  explicit,  which  enables  me  to 
define  a  suitable  notion  of  garbage  and  garbage  collection. 


Chapter  8 

The  TIL/ML  Compiler 


TIL,  which  stands  for  Typed  Intermediate  Languages ,  is  a  batch  compiler  that  translates 
a  subset  of  Standard  ML  to  DEC  Alpha  assembly  language.  Together  with  David  Tarditi, 
Perry  Cheng,  and  Chris  Stone,  I  have  constructed  TIL  to  explore  some  of  the  practical 
issues  of  type-directed  translation  and  dynamic  type  dispatch.  In  this  chapter,  I  give  an 
overview  of  the  design  and  implementation  of  TIL,  catalog  its  features  and  drawbacks, 
and  compare  it  to  Standard  ML  of  New  Jersey  —  one  of  the  best  SML  compilers  currently 
available. 

Throughout  this  chapter,  I  use  the  plural  (e.g.,  “we”)  when  referring  to  Tarditi, 
Cheng,  Stone,  and  me.  I  reserve  the  singular  (e.g.,  “I”)  when  referring  only  to  myself. 


8.1  Design  Goals  of  TIL 

In  designing  TIL,  our  primary  goal  was  to  make  the  common  case  fast,  possibly  at  the 
expense  of  a  less  common  case.  For  example,  all  functions  in  SML  take  one  argument; 
multiple  arguments  are  simulated  by  using  a  tuple  as  the  argument.  From  previous  studies 
[81,  110],  we  determined  that  most  functions  do  not  use  the  tuple  argument  except  to 
extract  the  components  of  the  tuple.  Consequently,  we  wanted  TIL  to  translate  functions 
so  that  they  take  tuple  components  in  registers  as  multiple  arguments,  thereby  avoiding 
constructing  the  argument  tuple. 

Our  secondary  goal  was  to  use  type-directed  translation  to  propagate  type  information 
through  as  many  stages  of  compilation  as  was  possible.  The  idea  was  to  try  to  discover 
new  ways  that  types  could  be  used  in  the  lower-levels  of  a  compiler.  Some  uses  of  types 
came  at  a  surprisingly  low  level.  For  instance,  we  used  type  information  to  orient  switch 
arms  to  maximize  correctly  predicted  branches  for  list  and  tree-processing  code.  To 
support  tag-free  garbage  collection,  we  knew  that  it  was  necessary  to  hang  on  to  as  much 
type  information  for  as  long  as  was  possible.  To  accomplish  this,  we  needed  a  suitably 
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expressive,  typed  intermediate  form. 

Another  design  goal  was  to  leverage  existing  tools  as  much  as  possible.  For  instance, 
we  decided  to  emit  Alpha  assembly  language  and  let  the  native  assembler  handle  instruc¬ 
tion  scheduling  and  opcode  emission.  We  were  careful  to  use  standard  Unix  tools,  such 
as  Id  so  that  we  could  take  advantage  of  profilers,  debuggers,  and  other  widely  used 
tools.  Also,  to  avoid  constructing  a  parser,  type-checker,  and  pattern  match  compiler, 
we  decided  to  use  the  front  end  of  the  ML  Kit  Compiler  [19]. 

Finally,  we  wanted  TIL  to  be  as  interoperable  with  other  languages  (notably  C,  C-F-F, 
and  Fortran)  as  possible,  without  compromising  the  efficiency  of  conventional  SAIL  code. 
The  goal  was  to  support  efficient  access  to  library  routines,  system  calls,  and  hardware, 
which  is  needed  for  “systems”  programming  in  SML  as  proposed  by  the  Fox  project  [56]. 
To  this  end,  we  decided  to  use  tag-free  garbage  collection  to  support  untagged,  unboxed 
integers  and  pointers,  since  most  arguments  to  libraries  or  system  calls  involve  these  two 
representations.  Also,  we  made  the  register  allocator  aware  of  the  standard  C  calling 
convention  so  that  C  functions  could  be  directly  called  from  SAIL  code. 

We  decided  not  to  use  a  fully  tag-free  collector,  but  instead  to  place  tag  words  before 
heap-allocated  objects  (i.e. ,  records  and  arrays).  For  records,  we  decided  to  use  a  bit  map 
to  describe  which  components  are  pointers.  Arrays  are  uniform,  so  we  planned  to  use  a 
single  bit  in  the  length  tag  to  tell  whether  or  not  the  contents  of  the  array  are  pointers. 

Alost  allocators  for  languages  like  C  and  C++  also  use  header  words  for  heap-allocated 
objects,  so  the  proposed  scheme  would  not  sacrifice  interoperability.  Furthermore,  we 
suspected  that  most  tags  could  be  computed  at  compile  time,  so  the  cost  of  constructing 
these  tags  would  not  be  prohibitive.  We  also  felt  that  tagging  heap  objects  would  simplify 
the  garbage  collector  since  we  could  use  standard  breadth-first  copying  collection  once 
the  “roots”  (i.e.,  registers  and  stack)  had  been  scanned.  Fully  tag-free  collectors  cannot 
use  the  standard  breadth-first  scan,  because  there  is  not  always  space  for  a  forwarding 
pointer  in  a  tag-free  object1.  Furthermore,  we  were  worried  that  the  sizes  of  tables 
that  contain  full  type  information  might  be  excessively  large  [35].  Taking  all  of  these 
factors  into  account,  we  felt  that  leaving  integers  and  pointers  untagged,  but  tagging 
heap-allocated  objects,  had  the  most  virtues. 

We  decided  not  to  unbox  double-precision  floating  point  values  except  within  functions 
and  within  arrays.  We  felt  that  unboxing  doubles  in  arrays  would  be  important  for 
scientific  code  (e.g.,  matrix  operations).  Unboxing  double  function  arguments  requires  a 
more  complicated  approach  to  calling  conventions  and  register  allocation  in  the  presence 
of  polymorphism.  (I  discuss  this  issue  in  Section  8.8).  Similarly,  unboxed  floating  point 
values  in  records  require  a  more  complicated  mechanism  for  calculating  the  sizes  and  tags 
of  records,  as  well  as  the  offsets  of  fields  within  records.  We  were  unsure  whether  the  run 

1Yasuhiko  Minamide  pointed  this  out  to  me  and  showed  that  Tolmach  failed  to  properly  correct  for 
this  in  his  tag-free  collector  [119]. 
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time  costs  of  the  more  complicated  mechanisms  would  outweigh  the  benefits  of  unboxed 
doubles,  especially  for  conventional  SML  code  which  typically  manipulates  many  records 
and  few  doubles. 

We  did  design  the  compiler  so  that  unboxed  doubles  could  either  be  passed  as  function 
arguments  or  placed  in  records  for  monomorphic  code.  We  hope  to  use  TIL  in  the 
future  to  explore  the  tradeoffs  of  various  mechanisms  that  support  unboxed  doubles  for 
polymorphic  code. 


8.2  Overview  of  TIL 

Figure  8.1  gives  a  block-diagram  of  the  stages  of  the  TIL  compiler.  All  of  the  transfor¬ 
mations  are  written  in  Standard  ML.  In  this  section,  I  give  a  brief  overview  of  each  stage 
and  in  the  following  sections,  I  provide  more  detailed  information. 

The  first  phase  of  TIL  uses  the  front  end  of  the  ML  Kit  compiler  [19]  to  parse,  type 
check,  and  elaborate  SML  source  code.  The  Kit  produces  annotated  abstract  syntax 
for  the  full  SML  language  and  then  compiles  a  subset  of  this  abstract  syntax  to  an 
explicitly-typed  core  language  called  Lambda.  The  compilation  to  Lambda  eliminates 
pattern  matching  and  various  derived  forms. 

I  extended  Lambda  to  support  signatures,  structures  (modules),  and  separate  com¬ 
pilation.  Each  source  module  is  compiled  to  a  Lambda  module  with  an  explicit  list  of 
imported  modules  and  their  signatures.  Imported  signatures  may  include  transparent 
definitions  of  types  defined  in  other  modules.  Hence,  TIL  supports  a  limited  form  of 
translucent  [58]  or  manifest  types  [83]. 

I  extended  the  mapping  from  SML  abstract  syntax  to  Lambda  so  that  SML  struc¬ 
tures  are  mapped  to  Lambda  structures  with  transparent  imported  types.  Currently, 
the  mapping  to  Lambda  does  not  handle  source-level  signatures,  nested  structures,  or 
functors.  In  principle,  however,  all  of  these  constructs  are  supported  by  the  intermediate 
languages  of  TIL. 

The  next  phase  of  TIL  uses  an  intermediate  language  called  Lmli.  Lmli  is  a  “real 
world”  version  of  \f!L ,  providing  constructs  for  dynamic  type  dispatch,  efficient  data 
representations,  recursive  functions,  arrays,  and  so  forth.  In  the  translation  of  Lambda 
to  Lmli,  we  use  these  constructs  to  provide  tag-free  polymorphic  equality,  specialized 
arrays,  efficient  data  representations,  and  multi-argument  functions.  The  argument  flat¬ 
tening  and  implementation  of  polymorphic  equality  are  based  on  the  formal  type-directed 
translation  of  Chapter  5. 

Like  Xf1L ,  type  checking  Lmli  terms  is  decidable.  I  provide  a  kind  checker,  constructor 
normalize!’,  and  type  checker  for  Lmli.  We  also  provide  support  for  pretty-printing  Lmli 
terms.  Currently,  the  type  checker  is  quite  slow  because  I  normalize  all  types  before 
comparing  them.  A  much  better  approach  is  to  compare  types  directly  and  normalize 
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Figure  8.1:  Stages  in  the  TIL  Compiler 
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components  only  if  they  do  not  match. 

Lmli-Bform  (or  simply  Bform)  is  a  subset  of  Lmli  similar  to  A-normal  form  [42],  It 
provides  a  more  regular  intermediate  language  than  Lmli  to  facilitate  optimization.  Be¬ 
cause  Bform  is  a  subset  of  Lmli,  we  can  use  all  of  the  Lmli  tools,  including  the  type 
checker  and  pretty  printer  on  the  Bform  representation2.  We  perform  a  wide  variety  of 
optimizations  on  the  Bform  representation  of  a  program  including  dead  code  elimina¬ 
tion;  uncurrying;  constant  folding;  constant  typecase  and  switch  elimination;  inlining  of 
constructor  functions,  term  functions,  type  functions  and  switch  continuations;  common 
sub-expression  elimination;  redundant  switch  elimination;  and  invariant  removal.  Be¬ 
cause  the  optimization  phases  use  Bform  for  both  the  source  and  target  language,  the 
output  of  each  phase  can  be  checked  for  type  correctness. 

Most  of  the  design  and  implementation  of  the  optimizer  is  not  my  work,  and  is 
described  fully  by  Tarditi’s  thesis  [115].  It  is  interesting  to  note  that  working  with  a 
typed  intermediate  form  did  not  constrain  the  set  of  optimizations  that  Tarditi  wished 
to  perform,  and  that  types  could  be  used  to  perform  some  optimizations  that  were  not 
possible  in  an  untyped  setting.  However,  working  with  a  typed  intermediate  form  did 
have  some  drawbacks.  In  particular,  our  typed  intermediate  form  needs  more  constructs 
(e.g.,  typecase)  than  a  comparable  untyped  form.  This  makes  the  optimizer  code  bigger 
since  there  are  more  cases  to  process.  In  turn,  this  increases  the  likelihood  of  introducing 
bugs  in  compilation.  However,  we  found  that  the  ability  to  type-check  the  output  of  the 
optimizer  often  mitigated  this  drawback. 

After  Bform  optimization,  we  perform  closure  conversion,  mapping  the  Bform  rep¬ 
resentation  to  a  language  called  Lmli-Close.  In  fact,  all  of  the  constructs  of  Lmli-Close 
are  present  in  Bform  but  are  unused  until  the  closure  phase  of  the  compiler.  Hence, 
Lmli-Close  is  a  refinement  of  Lmli-Bform,  much  the  same  as  Lmli-Bform  is  a  refinement 
of  Lmli.  The  conversion  is  based  on  the  type-directed  closure  translation  described  in 
Chapter  6.  However,  following  Kranz  [78],  we  calculate  the  set  of  functions  that  do  not 
“escape”  and  avoid  constructing  closures  for  such  functions.  Because  Lmli-Close  is  a 
subset  of  Bform,  we  can  use  both  the  optimizer  and  the  type  checker  on  the  closure 
converted  code. 

After  closure  conversion,  we  translate  the  resulting  code  to  an  untyped  Bform,  called 
Ubform.  Instead  of  annotating  variables  with  types,  Ubform  requires  that  we  annotate 
variables  with  representation  information.  The  translation  to  Ubform  erases  the  distinc¬ 
tion  between  computations  at  the  constructor  and  term  levels.  For  example,  a  constructor 
function  call  looks  the  same  as  a  term  function  call. 

The  next  phase  of  TIL  maps  Ubform  programs  to  the  Rtl  intermediate  form.  Rtl, 

2 The  actual  ML  datatypes  used  for  Lmli  and  Bform  differ,  but  we  provide  a  simple  map  from  Bform 
to  Lmli.  If  SML  provided  refinement  types  [44],  then  we  could  have  defined  Bform  as  a  refinement  of 
Lmli  and  avoided  this  extra  piece  of  code. 
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which  stands  for  register  transfer  language,  provides  an  idealized  RISC  instruction  set, 
with  a  few  heavy-weight  instructions  and  an  infinite  number  of  registers.  After  Rtl,  we 
perform  register  allocation  and  map  the  resulting  code  to  DEC  Alpha  assembly  language. 

We  use  the  system  assembler  to  translate  Alpha  assembly  language  to  binaries  and 
the  system  linker  to  link  these  binaries  with  the  runtime  system.  The  runtime  is  written 
in  C  and  provides  code  for  initialization,  memory  management,  and  multi-threading.  The 
garbage  collector  uses  table  information  generated  at  the  Rtl  level  to  determine  which 
registers  and  stack  slots  contain  pointer  values.  The  rest  of  the  garbage  collector  is  a 
standard,  two-space  copying  collector. 


8.3  SML  and  Lambda 

Currently,  we  do  not  support  the  signatures,  nested  structures,  or  functors  of  Standard 
ML.  There  are  also  some  parts  of  core  SML  that  we  do  not  support.  In  order  to  assign 
a  quantified,  polymorphic  type  to  an  expression,  we  require  that  that  expression  be 
syntactically  equivalent  to  a  value.  By  value,  we  mean  that  the  expression  must  be 
a  constant,  a  record  of  values,  a  data  constructor  (besides  ref)  applied  to  values,  or 
a  function.  This  so-called  “value  restriction”  is  necessary  to  support  a  type-passing 
interpretation  of  SML,  since  polymorphic  computations  are  represented  as  functions.  The 
value  restriction  has  been  proposed  by  others  [63,  57,  82]  as  a  way  to  avoid  the  well-known 
problems  of  polymorphism  and  refs,  exceptions,  continuations,  and  other  constructs  that 
have  computational  effects.  Furthermore,  according  to  a  study  performed  by  Wright 
[131],  most  SML  code  naturally  obeys  the  value  restriction.  The  few  cases  he  found  that 
do  not,  are  easily  transformed  so  that  they  do  meet  this  restriction. 

The  other  restriction  on  core  SML  code  involves  datatypes.  We  do  not  support 
recursive  datatypes  of  the  form: 

datatype  a  foo  =  D1  of  (a  *  a )  foo  ->  int 

where  a  type  constructor  foo  abstracts  a  type  argument  a,  and  is  defined  in  terms  of 
itself  applied  to  a  different  type  containing  the  abstracted  variable  (e.g.,  (a  *  a )  foo). 
Representing  such  a  datatype  as  a  predicative  constructor  is  impossible  because  the  type 
variable  a  must  be  abstracted  inside  the  recursion  equation  governing  the  definition. 
Hence,  the  recursion  equation  defines  a  fixed-point  over  polytypes  instead  of  monotypes. 
Such  datatypes  are  very  rare.  A  cursory  study  showed  that  no  datatypes  of  this  form 
existed  in  either  the  Edinburgh  or  the  SML/NJ  library.  In  many  respects,  the  ability  to 
define  such  datatypes  violates  the  type-theoretic  “essence  of  SML”  [94],  and  thus  I  view 
them  more  as  a  bug  in  the  Definition  [90]  than  a  feature.  In  principle,  TIL  supports 
the  rest  of  the  Standard  ML  Definition,  though  of  course  there  may  be  bugs  in  the 
implementation. 
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We  use  the  ML  Kit  Compiler  [19]  to  parse,  type-check,  and  elaborate  SML  expressions. 
The  Kit  translates  SML  to  annotated  abstract  syntax.  The  annotations  include  position 
information  for  error  reporting  as  well  as  type  information.  Next,  the  annotated  abstract 
syntax  is  translated  to  the  Lambda  intermediate  form.  This  intermediate  form  is  quite 
similar  to  the  language  described  by  Birkedal  et.  al.  [19],  but  I  added  support  for  type 
abbreviations,  structures,  and  signatures.  Also,  we  added  various  primitive  operations 
to  the  language  to  support,  for  instance,  unsigned  integer  operations,  logical  operations, 
and  so  forth. 

The  Kit  compiler  translates  core  SML,  annotated  abstract  syntax  to  Lambda  declara¬ 
tions.  During  the  translation,  all  type  definitions  are  hoisted  to  the  top-level  and  almost 
all  pattern  matching  is  eliminated.  Datatype  definitions  are  represented  in  almost  exactly 
the  same  fashion  as  they  are  at  the  source  level. 

I  modified  the  translation  to  Lambda  to  support  structures  and  a  standard  “prelude” 
environment.  This  environment  contains  bindings  for  commonly  used  functions  such  as 
map  and  fold,  as  well  as  definitions  of  the  built  in  types,  including  arrays,  bools,  and 
strings.  The  prelude  environment  is  conceptually  prepended  onto  every  structure.  This 
allows  the  optimizer  to  easily  inline  functions  such  as  map. 

Some  primitive  types,  notably  strings,  are  defined  in  terms  of  other  datatypes  in  the 
prelude.  For  example,  the  string  type  is  represented  as  a  standard  datatype  of  the  form: 

datatype  string_rep  = 

C000  |  C001  |  C002  |  ...  |  C255  | 
stringrep_str  of  int  *  (int  array) 

The  data  constructors  C000 — C255  correspond  to  8-bit,  ASCII  characters  whereas  the 
data  constructor  stringrep_str  corresponds  to  strings  of  length  0  or  of  length  greater 
than  1.  We  found  that  distinguishing  characters  from  other  strings  was  important  since 
characters  could  always  be  represented  unboxed  and  character  comparison  could  be  im¬ 
plemented  efficiently3.  Strings  of  more  than  one  character  are  represented  as  integer 
arrays.  Since  integers  are  untagged,  each  integer  in  the  array  holds  4  characters  on  a 
32-bit  machine4.  The  extra  int  paired  with  the  integer  array  indicates  the  length  of 
the  string  in  characters.  All  of  the  string  primitives  —  including  implode,  explode,  and 
append  —  are  implemented  by  a  combination  of  pattern  matching  and  array  operations. 
The  ability  to  manipulate  strings  a  word-at-a-time  using  the  standard  integer  array  op¬ 
erations  simplified  the  implementation  greatly  without  sacrificing  performance. 

The  translation  does  provide  support  for  separate  compilation.  In  particular,  each 
SML  structure  is  compiled  to  a  Lambda  module  containing  a  list  of  all  imported  modules 

3Earlier  versions  of  SML/NJ  used  a  similar  representation,  but  newer  versions  expose  the  character 
datatype  to  the  programmer.  Our  string  representation  is  compatible  with  either  approach. 

4 Although  the  Alpha  is  a  64-bit  processor,  we  represent  integers  and  pointers  as  32-bit  values. 
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and  their  Lambda  signatures.  These  signatures  can,  but  need  not,  contain  transparent 
or  manifest  type  definitions.  Since  we  account  for  all  external  references  in  the  list  of 
imported  modules,  each  module  can  be  compiled  separately  from  other  modules  in  the 
program.  This  scheme  relies  upon  unique  module  names  at  link-time  to  resolve  inter¬ 
module  references. 


8.4  Lmli 

The  Lmli  intermediate  form  is  based  on  the  formal  \  fL  calculus  of  Chapter  3,  but  provides 
a  suitably  rich  set  of  constructs  to  support  efficient  compilation  of  Lambda  datatypes 
and  terms.  In  this  section,  I  present  various  details  regarding  Lmli,  including  the  SML 
datatype  used  to  represent  the  kinds,  constructors,  types,  and  terms  of  Lmli. 

8.4.1  Kinds,  Constructors,  and  Types  of  Lmli 

The  SML  datatype  definitions  for  a  subset  of  the  kinds  and  constructors  of  Lmli  are  given 
in  Figure  8.2.  To  simplify  the  presentation,  I  have  eliminated  all  of  the  constructs  that 
are  used  only  for  closure  conversion.  These  constructs  are  discussed  in  Section  8.7. 

All  constructors  are  labelled  with  a  kind.  This  simplifies  kind  checking  and  construc¬ 
tor  manipulation,  since  we  can  always  determine  a  constructor’s  kind  with  no  additional 
context.  A  more  sophisticated  implementation  might  elide  much  of  this  information  and 
reconstruct  it  as  needed.  The  kind  Mono_k  is  the  ASCII  representation  of  CL  and  thus 
represents  all  constructors  corresponding  to  monotypes.  Kinds  include  Mono_k,  n-ary 
products  (Tuple_k),  arrow  kinds  (Arrow_k),  and  list  kinds  (List_k).  In  TIL,  we  elide  the 
distinction  between  a  constructor  /j  of  kind  Mono_k  and  the  type  T(/j).  Instead,  we  use 
the  special  kind  Poly_k  to  distinguish  types  from  constructors5. 

Constructors  include  variables,  projections  from  modules  (Dot_c),  primitive  construc¬ 
tors  of  zero  or  one  argument  ,  tuple  intro  and  elim  forms,  list  intro  and  elim  forms,  function 
intro  and  elim  forms,  recursive  constructors  (Mu_c),  and  monotype  elim  forms.  I  briefly 
discuss  each  of  these  constructs  here,  and  then  provide  more  details  regarding  primitive 
constructors. 

The  list  intro  forms  include  Nil_c  and  Cons_c.  The  list  elim  forms  include  both  a 
fold  for  lists  (Fold_c)  and  a  simple  case  mechanism  (Listcase_c).  The  formation  rule 

5 TIL  actually  provides  more  kind  structure  than  is  shown  here,  in  order  to  check  well-formedness  of 
types. 
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datatype  kind  =  Mono_k  |  Tuple_k  of  kind  list 

|  Arrow_k  of  kind  *  kind  |  List_k  of  kind  |  Poly_k 

datatype  primcon  =  PrimconO  of  primconO  |  Primconl  of  primconl 

datatype  con  =  Con  of  (raw_con  *  kind) 
and  raw_con  = 

Var_c  of  var 

|  Dot_c  of  strid  *  int  *  label 
I  PrimCLc  of  primconO 
|  Priml_c  of  primconl  *  con 
I  Tuple_c  of  con  list 
I  Proj_c  of  int  *  con 
I  Nil_c  of  kind 
|  Cons_c  of  con  *  con 

I  Listcase_c  of  {arg  :  con,  nil_c  :  con,  cons_c  :  confn} 

|  Fold_c  of  {arg  :  con,  nil_c  :  con,  cons_c  :  confn} 

|  Fn_c  of  confn 
I  App_c  of  con  *  con 
|  Mu_c  of  ((var  *  con)  list)  *  con 
I  Typecase_c  of  {arg  :  con, 

arms  :  (primcon  *  confn)  list, 

default:  con, 
kind  :  kind} 

I  Typerec_c  of  {arg  :  con, 

arms  :  (primcon  *  confn)  list, 

default  :  con, 

kind  :  kind} 

|  Let_c  of  (var  *  kind  *  con  *  con) 

I  All_c  of  (var  *  kind)  list  *  con 
and  confn  =  CF  of  (var  *  kind  *  con) 


Figure  8.2:  Kinds  and  Constructors  of  Lmli 
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for  the  fold  construct  is  roughly 

A  h  //,  ::  List_k(/i/)  A  h  m  ::  n 
Ah  /i2  ::  Tuple_k[/i/,  List_k(K/),  n]  — >  k 
A  h  Fold_c{arg=/v, ,nil_c=/ii ,  cons_c=/K2}  ::  k 

The  arg  component  must  be  of  list  kind  and  thus  evaluates  to  either  a  Nil_c  or  else  a 
Cons_c  constructor.  The  nil_c  component  is  selected  for  the  base  case  and  the  arg_c 
component  is  selected  for  the  inductive  case.  The  head  and  tail  of  a  Cons_c  cell  are 
passed  to  the  cons_c  component  as  arguments,  with  the  unrolling  of  the  fold  on  the  tail 
of  the  list.  The  formation  rule  for  listcase  is  similar,  but  the  nil_c  component  only  takes 
the  head  and  tail  as  arguments. 

The  monotype  dim  forms  include  both  Typerec_c  and  Typecase_c  forms.  The  clauses 
are  indexed  by  a  primcon,  and  each  primcon  can  occur  in  at  most  one  clause.  Clauses 
corresponding  to  primconl  values  are  functions  that  take  appropriate  arguments  of  the 
appropriate  kind,  whereas  clauses  corresponding  to  primconO  values  are  constructor  func¬ 
tions  that  take  an  empty  tuple  as  an  argument.  The  default  case  is  used  to  match  con¬ 
structors  that  do  not  appear  in  the  arms  list.  Note  that  although  recursive  types  are 
considered  to  be  monotypes,  there  is  no  way  to  deconstruct  them  in  Lmli;  the  default 
clause  of  a  Typecase_c  or  Typerec_c  is  always  selected  when  one  of  these  constructs  is 
applied  to  a  Mu_c  constructor. 

The  function  intro  (Fn_c)  and  elim  (App_c)  forms  are  standard.  The  Let_c  form 
can  be  abbreviated  with  these  constructs  as  usual.  However,  this  is  not  possible  in  the 
restricted  Lmli-Bform  that  is  used  in  the  optimizer.  We  retain  Let_c  here  to  support 
pretty-printing,  and  both  kind  and  type-checking  of  Lmli-Bform.  The  All_c  constructor 
is  not  really  a  constructor,  but  rather  a  type.  It  is  always  labelled  with  the  kind  Poly_k. 

The  Mu_c  constructor  is  a  generalized  recursive  type  constructor.  Informally,  Mu_c 
corresponds  to  a  “letrec”  at  the  constructor  level,  simultaneously  binding  the  variables 
to  the  constructors,  within  the  scope  of  the  exported  constructor.  Each  of  the  variables 
is  constrained  to  have  the  kind  Mono_k.  We  require  that  the  constructors  bound  to  the 
variables  be  expansive.  That  is,  if  one  of  the  variables  bound  in  the  Mu_c  occurs  in  a 
constructor  bound  to  a  variable,  then  that  variable  must  occur  within  an  argument  to  a 
primconl. 

The  type  that  a  recursive  constructor  represents  is  isomorphic  to  the  type  obtained  by 
simultaneously  replacing  each  variable  in  the  exported  constructor  with  the  “unrolling”  of 
the  recursive  type.  For  instance,  the  recursive  constructor  Mu_c (  [(xl , cl)  ,  (x2,c2)]  ,c) 
is  isomorphic  to  {cl ; /xl , c2 J /x2}c  where 

cl’  =  Con(Mu_c ( [(xl , cl) , (x2 , c2)] , Var_c (xl ,Mono_k) ) ,Mono_k) 


and 
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c2’  =  Con(Mu_c ( [(xl , cl) , (x2 , c2)] , Var_c (x2 ,Mono_k) ) ,Mono_k)  . 

This  isomorphism  is  not  implicit  as  in  some  calculi.  At  the  term  level,  we  must  use 
explicit  “roll”  and  “unroll”  operations  on  terms  to  affect  the  isomorphism. 

The  addition  of  recursive  types  to  A  fL  is  not  straightforward  since  we  have  destroyed 
the  key  property  that  the  monotypes  can  be  generated  by  induction.  However,  the  elim 
forms  that  we  provide  at  the  constructor  level  treat  recursive  types  as  pseudo-base  cases. 
In  particular,  we  cannot  examine  the  contents  of  a  Mu_c  constructor  with  Typecase_c 
or  Typerec_c.  I  therefore  speculate  that  both  confluence  and  strong-normalization  of 
constructor  reduction  are  preserved,  though  I  have  yet  to  prove  this. 

The  primconO  and  primconl  datatypes  are  defined  as  follows: 

datatype  primconO  =  Int_c  |  Real_c  |  String_c  |  Intarray_c 
I  Realarray_c  |  Exn_c  |  Enum_c  of  int 

datatype  primconl  =  Ptrarray_c  |  Arrow_c  |  Sum_c  |  Sumcursor_c 
I  RecorcLc  |  Recordcursor_c  |  Excon_c  |  Deexcon_c 
|  Enumorrec_c  of  int  |  Enumorsum_c  of  int 

Most  of  the  primconO  data  constructors  are  self  explanatory.  We  provide  a  string  con¬ 
structor  even  though  string  values  are  represented  in  terms  of  other  constructs  (integer 
arrays).  This  distinction  is  necessary  to  support  the  proper  semantics  of  polymorphic 
equality,  since  strings  are  compared  by  value  whereas  arrays  are  compared  by  reference. 
The  Exn_c  constructor  is  used  to  type  exception  packets,  and  the  Enum_c  constructor  is 
used  in  the  translation  of  datatypes.  Enum  values  are  used  to  represent  data  construc¬ 
tors  with  no  argument  (e.g.,  nil).  Enum  values  are  also  used  to  tag  variant  records. 
We  assume  that  enum  values  are  always  distinguishable  from  pointers  to  heap-allocated 
objects.  Currently,  we  represent  enum  values  as  small  integers  between  0  and  255.  We 
could  represent  enum  values  as  odd  integers,  assuming  pointers  are  always  evenly  aligned. 

The  primconl  data  constructors  are  primitive  constructors  of  one  argument.  Multiple 
arguments  are  simulated  by  a  constructor  tuple  or  a  list  of  constructors.  The  Ptrarray_c 
(pointer  array)  constructor  corresponds  to  arrays  that  contain  any  value  except  for  inte¬ 
gers  or  reals.  The  Arrow_c  (arrow)  constructor  corresponds  to  functions  at  the  term  level. 
Functions  in  Lmli  take  multiple  arguments  and  yield  one  result.  Therefore,  the  arrow  con¬ 
structor  takes  two  arguments  (as  a  constructor  tuple),  and  these  arguments  correspond 
to  the  domain  types  and  the  range  type.  The  domain  types  are  represented  as  a  list  of 
constructors.  Thus,  arrow  has  the  kind  Tuple_k[List_k(Mono_k),  Mono_k]  — >  Mono_k. 

The  Record_c  (record)  constructor  corresponds  to  n-tuples  at  the  term  level  and  has 
the  kind  List_k(Mono_k)  — >•  Mono_k.  Recordcursor_c  values  are  used  to  iterate  over  the 
components  of  a  Record_c  value.  Roughly  speaking,  a  record  cursor  is  a  pair  consisting 
of  a  pointer  to  a  record  and  an  integer  offset. 
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The  Sum_c  (sum)  constructor  represents  immutable,  Pascal-style  variant  record 
types.  Sum  values  are  in  fact  records  where  the  first  component  of  the  record  con¬ 
tains  an  Enum_c  value  indicating  the  variant.  Thus,  the  sum  constructor  has  kind 
List_k(List_k(Mono_k))  — >  Mono_k.  For  instance,  values  with  a  type  described  by  the 
constructor 

Sum_c [ [Int_c ,Real_c]  , [String_c]] 
are  either  records  with  a  type  described  by 
Record_c [Enum_c  2 , Int_c ,Real_c] 
or  else  records  with  a  type  described  by 
Record_c [Enum_c  2,String_c] . 

Similar  to  record  cursors,  Sumcursor_c  values  provide  a  means  for  folding  a  computation 
across  a  sum. 

Values  described  by  Excon_c  and  Deexcon_c  are  used  to  represent  SML  exceptions. 
In  particular,  each  creation  of  an  SML  exception  carrying  type  r  is  translated  to  a  single 
operation  that  creates  a  pair  consisting  of  an  Excon_c(r)  value  and  a  Deexcon_c(r) 
value.  A  value  of  type  Excon_c  (r)  can  be  applied  to  a  value  of  type  r  to  yield  a  value 
of  type  Exn_c,  thereby  hiding  the  type  r.  A  value  of  type  Deexcon_c(r)  can  be  applied 
to  a  value  of  type  Exn_c.  The  result  is  essentially  a  r  option:  a  variant  record  where  the 
first  variant  is  empty  (None)  and  the  second  variant  contains  a  value  of  type  r  (Some). 

Finally,  the  Enumorrec_c  and  Enumorsum_c  constructors  are  special  cases  of  variant 
records  used  to  optimize  the  representation  of  SAIL  datatypes.  In  particular,  enum-or-rec 
values  are  either  an  enum  value  or  a  record  value,  whereas  enum-or-sum  values  are  either 
an  enum  value  or  a  variant  record  (i.e. ,  sum)  value.  Since  records  and  sums  are  always 
allocated  (i.e.,  pointers),  we  can  always  distinguish  enum  values  from  records  and  sums. 

8.4.2  Terms  of  Lmli 

The  terms  of  Lmli  are  described  by  the  SAIL  datatype  given  in  Figure  8.3.  Similar  to 
constructors,  each  term  is  labelled  with  its  type,  where  the  type  is  represented  as  an 
Lmli  con.  Labelling  each  term  with  its  type  simplifies  type  checking  and  type-directed 
translation.  The  space  overheads  of  this  fully  typed  representation  are  not  quite  as  great 
as  we  might  first  expect.  This  is  because  we  can  bind  types  to  variables  (via  Let_e) 
and  use  the  variable  in  place  of  the  type  representation.  Then,  we  can  use  standard 
optimization  techniques,  such  as  common  sub-expression  elimination  and  hoisting,  to 
eliminate  redundant  type  definitions. 
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datatype  exp  =  Exp  of  (raw_exp  *  con) 
and  raw_exp  = 

Var_e  of  var 

I  Dot_e  of  strid  *  int  *  label 
I  Record_e  of  exp  list 
|  Inject_e  of  int  *  (exp  list) 

I  Int_e  of  word 
|  Real_e  of  string 
I  Enum_e  of  int 
|  String_e  of  string 
I  Let_e  of  decl  *  exp 
|  Fn_e  of  function 
I  Tfn_e  of  tfunction 
I  App_e  of  exp  *  (exp  list) 

I  Tapp_e  of  exp  *  (con  list) 

I  Coerce_e  of  coerceop  *  exp 
I  Opl_e  of  opl  *  exp 
I  0p2_e  of  op2  *  exp  *  exp 
|  Misc_e  of  miscop 
I  Switch_e  of  switch_exp 
|  Typecase_e  of  typecase_exp 
I  Tlistcase_e  of  tlistcase_exp 
|  Raise_e  of  exp 
I  Handle_e  of  exp  *  function 
|  Export _e  of 

{types  :  (label  *  con)  list,  values:  (label  *  exp)  list} 
and  decl  = 

Var_d  of  (var  *  con  *  exp) 

I  Con_d  of  (var  *  kind  *  con) 

|  Fix_d  of  (var  *  con  *  function)  list 
I  Fixtype_d  of  (var  *  con  *  tfunction)  list 
and  function  =  Func  of  (var  *  con)  list  *  exp 
and  tfunction  =  Tfunc  of  (var  *  kind)  list  *  exp 


Figure  8.3:  Terms  of  Lmli 
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The  raw  expressions  include  variables,  projections  from  imported  modules,  literal 
values,  various  primitive  operations,  various  switch  operations,  (recursive)  value  and 
constructor  abstractions,  value  and  constructor  applications,  exception  primitives,  let- 
expressions,  and  a  mechanism  for  exporting  type  definitions  and  values.  In  the  rest  of 
this  section,  I  briefly  describe  some  of  these  constructs. 

The  Inject_e  expression  is  used  to  calculate  a  sum  value  (i.e. ,  variant  record).  The 
integer  component  must  be  an  enum  value  (i.e.,  between  0  and  255).  Operationally, 
Inject_e  just  allocates  a  record  of  size  n  +  1,  places  the  enum  component  in  the  first 
position  and  then  places  the  other  values  in  positions  2  through  n +  1. 

Let_e  expressions  provide  a  means  of  declaring  variables  within  a  scope.  Variables  in 
a  declaration  are  either  bound  to  expressions  (Var_d),  constructors  (Con_d),  or  mutually 
recursive  functions.  The  Fix_d  declaration  provides  fixed-points  for  value  abstractions 
whereas  the  Fixtype_d  declaration  provides  fixed-points  for  constructor  abstractions. 
Both  value  and  constructor  abstractions  can  take  multiple  arguments. 

Coercions  are  primitive  operations  that  have  no  operational  effect,  but  are  needed  for 
type  checking.  The  set  of  coercion  operations  is  given  by 

datatype  coerceop  = 

proll  |  punroll  |  penum_enumorrec  |  prec_enumorrec 
I  penum_enumorsum  |  psum_enumorsum  |  penum2int  |  pfromstring 
|  ptostring  |  pchr. 

The  proll  and  punroll  coercions  affect  the  isomorphism  between  a  recursive  type  and 
its  unrolling.  The  penum_enumorrec,  and  penum_enumorsum  coercions  inject  an  enum 
value  into  an  enum-or-record  or  enum-or-sum  type,  whereas  the  prec_enumorrec  and 
psum_enumorrec  inject  a  record  or  sum  into  an  enum-or-record  or  enum-or-sum  type. 
The  penum2int  coercion  coerces  an  enum  to  an  integer  value.  The  pfromstring  and 
ptostring  coerce  a  string  to  and  from  its  underlying  representation.  Finally,  the  pchr 
operation  coerces  an  integer  to  an  enum  value6. 

The  opl  primitive  operations  are  given  by  the  datatype 

datatype  opl  = 

preal_i  I  pnot_i  |  pfloor_r  |  psqrt_r  |  psin_r  |  pcos_r 
I  parctan_r  I  pexp_r  I  pln_r  I  psize_a  of  spclarray 
I  pselect  of  int  I  prec_cursor  I  precJiead  I  prec_tail 
I  psum_cursor, 


where 

technically,  the  pchr  operation  should  ensure  that  the  integer  value  meets  the  representation  con¬ 
straints  of  enum  values.  In  practice,  the  front-end  ensures  this  by  checking  to  see  if  the  integer  lies 
between  0  and  255. 
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datatype  spclarray  =  Intarray  |  Realarray  |  Ptrarray. 

Most  of  the  rest  of  the  operations  perform  some  computation  on  real  values,  such  as 
calculating  the  square  root  (psqrt_r)  or  natural  log  (pln_r).  The  psize_a  operation 
calculates  the  size  of  an  array. 

The  prec_cursor  operation  takes  a  record  of  type  Record_c[cl,  .  .  .  ,cn]  and  pairs 
it  with  the  integer  0  to  form  a  record  cursor  of  type  Recordcursor_c  [cl ,  .  .  .  ,  cn]  .  Simi¬ 
larly,  the  psum_cursor  operation  takes  a  sum  value  of  type  Sum_c  [cl , .  .  .  ,  cn]  and  pairs  it 
with  the  enum  0  to  form  a  sum  cursor  of  type  Sumcursor_c  [cl ,  .  .  .  ,  cn] .  The  precJiead 
operation  takes  a  record  cursor  of  type  Recordcursor_c  [cl ,  .  .  .  ,  cn]  and  returns  a  value 
of  type  cl.  This  value  is  obtained  by  taking  the  current  offset  of  the  record  cursor  and 
selecting  this  component  from  the  record  of  the  record  cursor.  The  prec_tail  operation 
takes  a  record  cursor  of  type  Recordcursor_c  [cl ,  c2 ,  .  .  .  ,cn]  and  returns  a  new  record 
cursor  of  type  Recordcursor_c  [c2 ,  .  .  .  ,cn] .  This  new  cursor  is  obtained  by  taking  the 
offset  of  the  old  cursor,  incrementing  it,  and  then  pairing  it  with  the  record  of  the  old 
cursor  to  form  a  new  cursor.  When  combined  with  the  term-level  listcase  operations  on 
constructors,  record  cursors  can  be  used  to  fold  an  operation  across  the  components  of  a 
record.  We  use  this  facility,  for  example,  to  compute  polymorphic  equality  on  records  of 
arbitrary  arity. 

The  op2  primitive  operations  take  two  arguments  and  are  defined  as  follows: 

datatype  op2  = 

pdiv_i  |  pmul_i  |  pplus_i  |  pminus_i  |  pmod_i 

I  peq_i  |  plst_i  I  pgtt_i  |  plte_i  I  pgte_i 
|  pdiv_ui  |  pmul_ui  |  pplus_ui  |  pminus_ui 
I  plst_ui  I  pgtt_ui  I  plte_ui  I  pgte_ui 
I  por_i  |  pand_i  |  pxor_i  |  plshift_i  |  prshift_i 
I  pdiv_r  |  pmul_r  |  pplus_r  |  pminus_r  |  peq_r 

|  plst_r  |  pgtt_r  |  plte_r  |  pgte_r 

I  palloc_a  of  spclarray  |  psub_a  of  spclarray 

I  pexcon  |  pde_excon  |  peqptr 

Operations  ending  in  “_i”  are  signed,  checked  integer  operations,  whereas  operations 
ending  in  “_ui”  are  unsigned,  unchecked  operations.  (The  checks  are  for  overflow  and 
divide  by  zero.)  The  operations  ending  in  “_r”  are  double-precision  (i.e. ,  64-bit)  IEEE 
floating  point  operations. 

The  palloc_a  operations  allocate  arrays  of  the  appropriate  type,  where  the  size  is 
determined  by  the  first  argument  and  the  array  is  initialized  with  the  second  argument. 
The  psub_a  operation  extracts  a  value  from  an  array  (the  first  argument)  at  the  given 
offset  (the  second  argument).  The  pexcon  operation  takes  a  value  of  type  Excon_c(r) 
and  a  value  of  type  r  and  returns  a  value  of  type  Exn_c.  The  pde_excon  operation  takes  a 
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value  of  type  Exn_c  and  a  value  of  type  Deexcon_c  (r)  and  returns  a  Enumorrec_c  (1 ,  [r] ) 
value.  The  resulting  value  is  either  an  enum  or  else  a  record  containing  a  r  value. 

The  miscop  operations  are  given  by  the  following  datatype: 

datatype  miscop  = 

pupdate_a  of  spclarray  *  exp  *  exp  *  exp 
|  pextern  of  string  *  con 
|  pnew_exn  of  con 
|  peq  of  con 
I  vararg  of  con  *  exp 
I  onearg  of  con  *  exp  *  exp 

The  pupdate_a  operation  is  used  to  update  an  array.  The  front-end  ensures  that  the  index 
is  within  range.  The  pextern  operation  is  used  to  reference  external  labels,  exported  by 
the  runtime  or  a  foreign  language  (e.g.,  C).  The  pnew_exn  operation  takes  a  constructor 
r  and  returns  a  pair  of  a  Excon_c(r)  value  and  a  Deexcon_c(r)  value. 

The  peq  operation  corresponds  to  polymorphic  equality  at  the  monotype  denoted  by 
the  given  constructor.  This  operation  can  be  coded  using  various  other  operations  in  the 
language  (see  Section  5.2.3).  In  fact,  a  later  stage  in  the  compiler  replaces  all  occurrences 
of  peq  with  a  call  to  such  a  function  defined  in  a  separate,  globally  shared  module.  We 
leave  the  operation  as  a  primitive  so  that  the  optimizer  can  easily  recognize  and  specially 
treat  the  operation. 

The  vararg (r,e)  and  onearg (r ,  e ,  e ; )  operations  are  used  to  implement  dynamic 
argument  flattening  and  roughly  correspond  to  the  vararg  and  onearg  terms  of  Chapter 
5  (see  Sections  5.2.4  and  5.2.5). 

The  vararg  operation  takes  a  constructor  r  and  a  function  e.  The  typing  rules 
constrain  e  to  take  a  single  argument  of  type  r.  The  operation  calculates  a  coercion, 
based  on  r,  that  turns  e  into  a  multi-argument  function.  In  particular,  if  r  is  a  record 
constructor  of  the  form  Record_c  [cl,  .  .  . ,  cn]  ,  then  the  coercion  is  a  function  that  takes 
n  arguments  of  type  cl,---,cn,  respectively.  These  arguments  are  placed  into  a  record 
and  passed  to  the  original  function  e.  If  r  is  not  a  record,  then  the  coercion  is  the  identity. 

The  onearg  operation  takes  a  constructor  r,  a  function  e,  and  an  argument  e;. 
If  t  is  not  a  record  constructor,  then  the  operation  simply  applies  e  to  e ; .  If  r  is  a 
record  constructor  of  the  form  Record_c[cl,  .  .  .  ,  cn],  then  the  e  is  constrained  by  the 
typing  rules  to  be  a  multi-argument  function  that  takes  arguments  of  type  cl,---,cn, 
respectively,  and  e  ’  is  constrained  to  be  a  record  of  type  r.  In  this  case,  onearg  extracts 
the  components  of  the  record  e’  and  passes  them  directly  as  arguments  to  e. 

The  typing  constraints  on  vararg  and  onearg  are  expressed  using  Typecase_c  (see 
Section  5.2.1).  For  a  fixed  number  of  arguments,  both  vararg  and  onearg  can  be  im¬ 
plemented  directly  in  Lmli  by  a  combination  of  term-level  Typecase_e  and  Listcase_e 
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expressions  that  deconstruct  the  argument  constructor  and  calculate  the  appropriate  co¬ 
ercion.  Like  peq,  a  later  phase  in  the  compiler  replaces  all  occurrences  of  vararg  and 
onearg  with  references  to  these  terms,  which  are  placed  in  a  globally  shared  module.  We 
leave  both  forms  as  primitive  operations  to  facilitate  optimization. 

The  Switch_e  expression  provides  a  combination  of  a  control  flow  operator  and  a 
primitive  deconstructor  for  integer,  enum,  sum,  enum-or-record,  enum-or-sum,  and  sum 
cursor  values.  The  switch_exp  argument  to  Switch_e  is  defined  by  the  SML  type  ab¬ 
breviation 

type  switch_exp  = 

{switch_type  : 
arg 
arms 
default 

where  switch_type  is  given  by 
datatype  switch_type  = 

Int_sw  |  Enum_sw  |  Sum_sw  |  Enumorrec_sw  |  Enumorsum_sw 
I  Sumcursor_sw 

Each  clause  or  arm  of  the  switch  is  indexed  by  a  32-bit  word.  For  integer  and  enum 
switches,  the  value  of  the  argument  is  used  to  select  the  appropriate  arm  according  to 
this  index.  In  these  cases,  the  arms  are  functions  that  take  no  arguments.  For  sum 
switches,  the  enum  in  the  first  position  of  the  sum  is  used  to  select  the  appropriate  arm. 
The  arm  function  must  take  a  record  type  corresponding  to  the  appropriate  variant.  For 
instance,  if  e  has  type  Sum_c  [  [Int_c ,  String_c]  ,  [Int_c]  ] ,  then  the  0-arm  must  be  a 
function  that  takes  a  record  of  type  Record_c  [Enum_c  2 ,  Int_c ,  String_c]  whereas  the 
1-arm  must  be  a  function  that  takes  a  record  of  type  Record_c  [Enum_c  2,  Int_c] .  If  the 
switch  is  for  an  enum-or-record  or  enum-or-sum  value,  then  the  0-arm  corresponds  to  the 
enum  case  whereas  the  1-arm  corresponds  to  either  a  record  or  sum.  In  all  cases,  if  an 
arm  is  missing,  then  the  default  expression  is  chosen.  Defaults  are  required  unless  the 
arms  are  exhaustive. 

Recall  that  sum  cursor  values  are  implemented  as  pairs  consisting  of  an  enum  and  a 
sum  value.  Switch  expressions  for  sum  cursor  values  are  evaluated  as  follows:  if  the  enum 
value  of  the  cursor  matches  the  enum  value  of  the  sum  value,  then  the  0-arm  is  selected 
and  the  sum  value  is  passed  as  an  argument.  If  the  enum  value  of  the  cursor  does  not 
match,  then  the  1-arm  is  selected  and  a  new  cursor  value  is  constructed  from  the  old. 
The  new  cursor  has  the  same  sum  value,  but  increases  the  index  by  one.  If  the  original 
sum  cursor  has  type  Sumcursor_c  [cl ,  c2,  .  .  .  ,  cn],  then  the  new  sum  cursor  value  has 
type  Sumcursor_c  [c2 ,  .  .  .  ,  cn].  Therefore,  switch  on  sum  cursors  provides  a  means  for 
eliminating  one  of  the  possible  cases  in  a  sum. 


switch_type , 
exp, 

(word  *  function)  list, 
exp  Option}, 
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The  Tlistcase_e  and  Typecase_e  forms  provide  a  simple  eliminatory  form  for  con¬ 
structors  at  the  term  level.  We  do  not  provide  Fold_e  or  Typerec_e  at  the  term  level, 
because  these  may  be  simulated  with  Fixtype_d.  The  argument  to  a  Tlistcase_e  is 
described  by 

type  tlistcase_exp  = 

{arg  :  con, 
scheme:  var  *  kind  *  con, 
nil_c  :  exp, 
cons_c:  tfunction}. 

The  arg  component  is  the  argument  constructor,  which  must  be  of  a  list  kind.  The 
scheme  component  is  a  type  scheme  used  to  describe  the  type  of  the  clauses  as  well 
as  the  type  of  the  entire  expression.  The  type  of  the  entire  expression  is  obtained  by 
substituting  arg  for  the  type  variables  of  the  scheme  within  the  constructor  of  the  scheme. 
The  nil_c  clause  must  have  a  type  described  by  substituting  Nil_c  (at  the  appropriate 
kind)  for  the  variable  in  the  constructor.  The  cons_c  clause  must  be  a  function  that 
abstracts  two  constructor  arguments,  corresponding  to  the  head  and  tail  of  the  list  of 
constructors. 

The  argument  to  a  Typecase_e  is  described  by 

type  typecase_exp  = 

{arg  :  con, 
scheme  :  var  *  kind  *  con, 
arms  :  (primcon  *  tfunction)  list, 
mu_arm  :  tfunction  Option, 
default  :  tfunction  Option} 

Here,  the  arg  component  must  be  of  kind  Mono_k.  Again,  the  entire  type  is  obtained 
by  substituting  arg  for  the  variable  within  the  constructor  of  the  scheme.  Each  arm  is 
indexed  by  a  primitive  constructor.  The  mu_arm  matches  any  Mu_c  values.  The  “unrolling” 
of  the  recursive  constructor  is  passed  as  an  argument  to  this  clause  when  it  is  selected. 
The  default  component  is  selected  if  the  argument  does  not  match  any  of  the  arms. 

Raise_e  raises  an  exception  packet  of  type  Exn_c,  whereas  Handle_e  (e  ,f )  evaluates 
e  and  if  an  exception  is  raised,  the  exception  packet  is  passed  to  the  function  f.  The 
function  can  use  a  combination  of  the  de_excon  primitive  and  a  Switch_e  to  determine 
what  exception  was  raised  and  extract  a  value  from  the  packet. 

Finally,  the  Export_e  form  is  not  an  expression,  but  rather  an  anonymous  module. 
Each  compilation  unit  in  TIL  is  constrained  to  be  an  expression  consisting  of  a  series  of 
declarations  that  terminate  with  an  Export_e.  This  form  specifies  a  list  of  constructors 
and  values  that  are  to  be  exported  by  the  module.  Each  module  is  given  a  globally  unique 
strid  M.  The  module  is  described  by  a  signature 
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datatype  signat  = 

Signat  of  {types  :  (label  *  kind  *  (con  Option))  list, 
values  :  (label  *  con)  list} 

that  describes  the  kinds  and  types  of  the  constructors  and  values  exported  by  the  module. 
Each  exported  constructor  can  optionally  include  the  definition  of  the  constructor  in  the 
signature.  In  this  respect,  Lmli  signatures  resemble  the  translucent  sums  of  the  Harper 
and  Lillibridge  module  calculus  [58].  The  types  of  the  values  can  contain  references  to 
the  constructors  exported  by  the  module  —  using  the  Dot_c  notation  —  relative  to  the 
module  strid  M.  The  module  itself  is  represented  with  a  datatype  of  the  form 

datatype  module  = 

Module  of  {name:  strid, 

imports  :  (strid  *  signat)  list, 
signat  :  signat, 
body  :  exp} . 

The  imports  component  specifies  the  set  of  modules  (and  their  signatures)  upon  which 
this  module  depends. 


8.5  Lambda  to  Lmli 

In  the  translation  from  Lambda  to  Lmli,  I  eliminate  datatype  definitions  and  perform 
a  series  of  type-directed  transformations  that  specialize  arrays  and  refs,  flatten  function 
arguments,  box  floating  point  values,  and  flatten  certain  representations  of  datatypes. 
All  of  these  type-directed  translations  make  use  of  dynamic  type  analysis  when  they  en¬ 
counter  unknown  types.  I  also  provide  Lmli  terms  that  implement  polymorphic  equality, 
and  the  vararg  and  onearg  primitives  as  suggested  in  Chapter  5. 

In  this  section,  I  show  how  I  compile  datatypes  to  Lmli  constructors,  and  discuss  the 
various  type-directed  translations.  I  also  contrast  my  approach  to  datatype  representa¬ 
tions  with  that  of  SML/NJ  and  show  that,  unlike  SML/NJ,  I  am  able  to  flatten  data 
constructors  without  restricting  abstraction. 

8.5.1  Translating  Datatypes 

I  translate  a  simple  datatype  definition  of  the  form 

datatype  (aq,  •••,«„)  T  =  Di  I  D2  I  •  •  •  I  Dm, 

to  a  constructor  function  that  abstracts  the  type  variables  (aq  through  an).  The  body  of 
the  function  is  a  Mu_c  constructor  where  T  is  bound  to  a  representation  of  its  definition 
(discussed  below).  For  example,  the  SML  datatype 
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datatype  a  tree  =  Leaf  of  a  |  Node  of  a  tree  *  a  tree 

is  compiled  to  a  constructor  function  of  the  form 

tree  =  Xa:  :Mono_k  .Mu_c  ( [  (t  ,Sum_c  [  [«]  ,  [  [Var_c  t,Var_c  t]]])],Var_c  t)  . 

(I  have  elided  some  kind  information  and  used  A  to  represent  the  constructor  function.) 
Within  the  definition  of  the  datatype,  I  replace  recursive  references  with  the  Mu_c-bound 
variable.  For  instance,  in  the  previous  tree  definition,  I  replaced  a  tree  with  the  variable 
t.  This  replacement  is  possible  because  I  always  verify  that  a  datatype  is  applied  to  the 
same  type  variables  that  it  abstracts  (see  Section  8.3).  The  resulting  constructor  has 
kind  Mono_k  — >■  Mono_k. 

The  translation  of  a  datatype  applied  to  some  type  argument  is  straightforward:  I 
simply  apply  the  constructor  function  corresponding  to  the  datatype  to  the  translation 
of  the  type  arguments. 

This  approach  to  datatypes  was  originally  suggested  by  Harper  [59],  though  he  also 
suggests  the  use  of  an  existential  to  hide  the  representation  of  the  datatype.  Hiding  the 
representation  of  the  datatype  is  important  at  the  source  level,  since  this  distinguishes 
user  types  that  happen  to  have  the  same  representation.  However,  within  the  back-end 
of  a  compiler,  there  is  no  advantage  to  such  abstraction.  Therefore,  I  do  not  abstract  the 
representations  of  datatypes. 

The  general  form  of  an  SML  datatype  definition  is  a  series  of  mutually  recursive 
definitions  of  the  form 


datatype 

(oq,i  >  •  • 

4  i  ^i,m  ) 

Ti  =  A,  1 

1  A, 2  1  •• 

1  D  l,mi 

and 

(0:2,1  >  •  ■ 

4  9  C^2,7l2  ) 

H 

to 

II 

fa 

10 

1  A, 2  1  • 

1  D2pno 

and 

(073, 1  >  ■ 

*  4  9  ®P,np  ) 

T p  Dp.  1 

1  A, 2  1  • 

1  DPpnp 

In  this  general  case,  I  generate  one  constructor  function  that  abstracts  all  of  the  unique 
type  arguments.  The  function  uses  a  single  Mu_c  constructor  to  define  the  constructors 
simultaneously.  I  then  export  the  set  of  constructors  corresponding  to  the  datatype  as  a 
constructor  tuple.  Individual  types  are  obtained  by  instantiating  the  type  variables  and 
projecting  the  appropriate  component  from  the  tuple. 

The  representation  that  I  choose  for  a  datatype 

datatype  («!,••■,«„)  T  =  A  I  A  I  •  •  •  I  Dm 

depends  on  the  form  of  the  data  constructors,  Di,  D2 ,  •  •  %  Dm.  In  SML,  data  constructors 
can  have  zero  or  one  argument.  I  choose  the  representation  of  the  datatype  according  to 
the  following  cases: 


CHAPTER  8.  THE  TIL/ML  COMPILER 


181 


•  If  all  of  the  data  constructors  take  zero  arguments,  then  we  use  an  Enum_c  as 
the  translation  of  the  datatype.  For  example,  datatype  bool  =  true  |  false  is 
represented  as  an  (Enum_c  2)  constructor. 

•  If  there  is  only  one  data  constructor  and  this  data  constructor  takes  an  argument 
of  type  r,  then  we  use  the  translation  of  r  as  the  representation  of  the  datatype. 

•  If  all  of  the  data  constructors  take  one  argument,  and  there  is  more  than  one  data 
constructor,  we  use  a  Sum_c  (variant  record)  as  the  translation  of  the  datatype. 
For  example,  datatype  foo  =  Bar  of  int  I  Baz  of  real  is  translated  to  the 
constructor  Sum_c  [  [Int_c]  ,  [Real_c]  ] . 

•  If  all  but  one  of  the  data  constructors  takes  zero  arguments,  then  we  use  an 
Enumorrec_c  to  represent  the  datatype.  For  example,  the  datatype  a  list  =  nil 
I  ::  of  a  *  (a  list)  is  translated  to  an  Enumorrec_c  (1 ,  [Re  cord_c  [a,  Var_c 
t])  constructor  (where  t  is  recursively  bound  to  the  definition).  The  data  con¬ 
structor  that  takes  an  argument  (e.g.,  cons)  is  always  represented  as  a  record  so 
that  it  can  be  distinguished  from  Enum_c  values.  However,  this  introduces  extra 
indirection  when  the  argument  to  the  data  constructor  is  already  a  record  (e.g.,  cv 
*  (a  list)).  A  later  phase  eliminates  this  extra  indirection  when  possible  (see 
Section  8.5.3). 

•  If  there  is  more  than  one  data  constructor  that  takes  an  argument  and  there  are  data 
constructors  that  take  no  arguments,  the  datatype  is  translated  to  an  Enumorsum_c 
constructor.  Enum_c  values  are  used  for  the  data  constructors  that  take  no  argu¬ 
ments  whereas  Sum_c  values  are  used  for  the  data  constructors  that  take  arguments. 

A  naive  representation  of  datatypes  might  map  each  datatype  to  a  variant  record.  How¬ 
ever,  this  approach  would  cause  values  such  as  true,  false,  and  nil  to  be  allocated.  By 
mapping  datatypes  to  the  various  efficient  representation  types,  we  avoid  a  great  deal  of 
allocation. 

8.5.2  Specializing  Arrays  and  Boxing  Floats 

During  the  translation  from  Lambda  to  Lmli,  I  translate  polymorphic  array  operations 
such  as  sub  and  update  to  constructor  abstractions  that  perform  a  typecase  on  the 
unknown  type.  The  typecase  selects  the  appropriate  primitive  operation  (e.g.,  psub 
Realarray,  psub  Intarray,  or  psub  Ptrarray)  according  to  the  instantiation  of  this 
type. 

We  chose  to  distinguish  integer  and  floating  point  arrays  from  other  kinds  of  arrays 
for  a  variety  of  reasons:  first,  in  the  presence  of  a  generational  collector,  the  update 
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operation  on  these  arrays  does  not  require  a  write  barrier,  because  the  value  placed  in 
the  array  can  never  point  across  generational  boundaries.  Second,  by  leaving  integer 
arrays  untagged  and  unboxed,  we  are  able  to  use  them  to  represent  both  raw  strings 
and  byte  arrays.  Third,  by  distinguishing  floating  point  arrays,  we  are  able  to  align 
such  arrays  on  64-bit  boundaries,  providing  efficient  access  to  the  elements  of  the  array. 
SML/NJ  provides  specialized  monomorphic  strings,  bytearrays,  and  floating  point  arrays 
for  these  same  reasons.  However,  any  array  library  function  —  such  as  an  iterator  — 
must  be  coded  for  each  of  these  array  types. 

After  translating  Lambda  to  Lmli,  I  perform  a  series  of  type-directed  translations 
on  the  resulting  Lmli  code.  The  first  translation  ensures  that  all  float  values  are  boxed 
(i.e. ,  placed  in  a  record),  except  when  they  are  placed  in  floating  point  arrays.  The 
translation  simply  boxes  floating  point  literals,  unboxes  floats  as  they  are  passed  to 
primitive  operations  (e.g.,  pplus_r),  and  boxes  float  results  of  primitive  operations.  Much 
of  the  boxing  and  unboxing  is  eliminated  within  a  function  by  conventional  optimization. 

8.5.3  Flattening  Datatypes 

After  boxing  floats,  I  perform  a  type-directed  translation  to  flatten  Enumorrec_c  values. 
Consider  the  list  datatype: 

datatype  a  list  =  nil  I  :  :  of  a  *  (a  list) 

This  datatype  is  initially  translated  to  the  Lmli  constructor 
list  =  A  a:  :Mono_k. 

Mu_c  (  [t  ,Enumorrec_c  (1 ,  [Record_c  [o:,Var_c  t]])],Var_c  t)  . 

At  this  stage,  list  values  will  either  be  an  (Enum_c  1)  value  corresponding  to  nil  or  a 
Record_c  [Record_c  [a,  list  (a)]  ]  value  corresponding  to  cons.  Because  the  contents 
of  a  cons  cell  is  always  a  record  (Record_c  [a, list  (a:)] ),  we  can  always  determine 
such  values  from  nil  and  thus  eliminate  the  extra  Record_c[-]  in  cons  cells.  After  the 
constructor  flattening  phase,  the  list  datatype  is  represented  by  the  constructor 

list  =  Xa:  :Mono_k  .Mu_c  ( [t ,Enumorrec_c (1 ,  [a,Var_c  t])],Var_c  t)  . 

This  optimization  eliminates  an  extra  level  of  indirection  in  every  cons  cell,  and  is  thus 
very  important  for  typical  SML  code,  which  does  a  fair  amount  of  list-processing. 

However,  we  cannot  always  determine  at  compile  time  whether  or  not  we  can  flatten 
an  Enumorrec_c  constructor.  In  particular,  consider  the  option  datatype: 

datatype  a  option  =  NONE  |  SOME  of  a 

The  initial  translation  of  this  datatype  yields  the  constructor 
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option  =  Act:  :Mono_k .Enumorrec_c (1 ,  [a] )  . 

Since  the  data  constructor  SOME  has  an  argument  of  unknown  type  (a),  we  cannot  deter¬ 
mine  whether  SOME  will  always  be  applied  to  a  record  and  thus  cannot  determine  whether 
we  can  flatten  the  representation.  In  particular,  the  SML  type  int  option  should  not 
be  flattened  because  we  cannot  always  tell  integer  values  from  Enum_c  values. 

Therefore,  when  we  encounter  an  Enumorre c_c  (n,a)  constructor,  we  use  Typecase_c 
on  a  to  determine  whether  or  not  the  constructor  should  be  flattened.  Therefore,  after 
constructor  flattening,  the  option  datatype  is  represented  by  the  constructor 

option  =  A  a:  :Mono_k  .Typecase_c  a  of 

Record_c  [Ti,---,Tn]  =>  Enumorre  c_c  (1 ,  •,  rn] ) 

I  _  =>  Enumorrec_c (1 ,  [a] ) 

When  constructing  an  option  value  or  deconstructing  an  option  value,  we  must  use 
Typecase_e  at  the  term  level  to  determine  the  proper  code  sequence. 

My  approach  generalizes  the  constructor  flattening  performed  in  the  SML/NJ  com¬ 
piler.  In  SML/NJ,  cons  cells  are  flattened  but  SOME  cells  are  not,  precisely  because  the 
compiler  cannot  determine  at  compile  time  whether  it  can  safely  flatten  option  datatypes. 
Even  to  support  flattened  cons  cells,  SML/NJ  restricts  the  programmer  from  writing 
certain  legal  SML  programs  [10].  In  particular,  SML/NJ  will  not  let  the  programmer 
abstract  the  contents  of  a  cons  cell  in  a  signature  as  follows 

signature  LIST  = 
sig 

type  a  Abstractions 

datatype  a  list  =  nil  I  :  :  of  a  Abstractions 
end 

structure  List  :  LIST  = 
struct 

datatype  a  list  =  nil  |  :  :  of  a  Abstractions 
withtype  a  Abstractions  =  a  *  a  list 
end . 

Typing  this  code  into  the  SML/NJ  (version  1.08)  interactive  system  yields  the  following 
message: 

std_in: 0 . 0-23 . 5  Error:  The  constructor  ::  of  datatype  list 
has  different  representations  in  the  signature  and  the  structure. 
Change  the  definition  of  the  types  carried  by  the  constructors  in 
the  functor  formal  parameter  and  the  functor  actual  parameter  so 
that  they  are  both  abstract,  or  so  that  neither  is  abstract. 
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The  problem  is  that  a  functor  parameterized  by  the  LIST  signature  cannot  determine 
whether  the  contents  of  cons  cells  can  be  flattened;  any  such  functor  will  be  compiled 
assuming  that  cons  cells  are  not  flat,  whereas  the  structure  List  will  be  compiled  so  that 
cons  cells  are  flattened.  My  approach  makes  no  such  restriction  because  I  dynamically 
determine  the  representation  of  abstract  data  structures  when  necessary. 

8.5.4  Flattening  Arguments 

After  flattening  Enumorrec_c  values,  I  flatten  function  arguments  in  the  same  manner  as 
suggested  in  Chapter  5.  If  a  function  takes  a  record  as  an  argument  and  the  number  of 
elements  in  the  record  does  not  exceed  a  constant  k ,  then  the  function  is  transformed  to 
take  the  elements  of  the  record  as  multiple  arguments.  If  a  function  takes  an  argument 
of  known  type  that  is  not  a  record  (or  else  the  function  takes  a  record  with  greater  than 
k  components' ),  then  the  function  is  not  transformed.  If  a  function  takes  an  argument  of 
unknown  type,  then  the  function  is  compiled  expecting  a  single  argument.  The  vararg 
primitive  is  used  to  calculate  a  coercion  dynamically,  based  on  the  instantiation  of  the 
unknown  type. 

Likewise,  an  application  is  transformed  so  that,  if  the  argument  is  a  record  and  the 
number  of  elements  in  the  record  does  not  exceed  k,  then  the  components  of  the  record 
are  passed  directly  as  multiple  arguments.  If  the  argument  has  known  type,  but  is  either 
not  a  record  or  else  is  a  record  of  greater  than  k  components,  then  the  application  is  not 
transformed.  If  an  application  has  an  argument  of  unknown  type,  then  I  use  the  onearg 
primitive  to  calculate  a  coercion  dynamically,  based  on  the  instantiation. 


8.6  Bform  and  Optimization 

After  the  translation  to  Lmli,  and  after  the  series  of  type-directed  transformations,  we 
translate  Lmli  to  Bform.  Bform  is  a  subset  of  Lmli  that  makes  an  explicit  distinction  be¬ 
tween  small  values  and  constructors  that  fit  into  registers,  large  values  and  constructors 
that  must  be  allocated  on  the  heap,  and  computations  that  produce  values  or  construc¬ 
tors. 

At  the  term  level,  small  values  are  either  variables,  projections  from  a  module,  unit 
(an  empty  record),  integers,  floats,  enums,  external  labels,  or  coercions  applied  to  a 
small  value.  Large  values  include  strings,  records  of  small  values,  and  functions.  At  the 
constructor  level,  small  values  are  either  variables,  projections  from  modules,  an  empty 
tuple  of  constructors,  or  a  0-ary  primitive  constructor  (e.g.,  Int_c).  Large  values  include 
tuples  and  lists  of  constructors  as  well  as  primitive  constructors  that  take  arguments 
(e.g.,  Record_c). 

7 In  the  current  prototype,  k  is  arbitrarily  set  to  eight  arguments. 
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All  large  values  and  all  computations  are  bound  to  variables  in  a  declaration  and 
we  use  the  variable  in  place  of  the  large  value  or  computation.  Thus,  expressions  always 
manipulate  small  values.  These  constraints  ensure  that  we  avoid  duplicating  large  objects 
(such  as  strings)  and  preserve  as  much  sharing  as  possible.  These  constraints  also  have 
the  effect  of  linearizing  nested  computations  and  naming  intermediate  results.  All  of 
these  constraints  simplify  standard  optimization. 

Numerous  transformations  and  optimizations  are  applied  in  the  Bform  phase  to  pro¬ 
grams.  (See  Tarditi  [115]  for  a  more  complete  description  of  these  optimizations.)  The 
optimizations  include  the  following  conventional  transformations: 

•  alpha-conversion:  We  assign  unique  names  to  all  bound  variables. 

•  minimizing  fix:  We  break  functions  into  minimal  sets  of  mutually  recursive  func¬ 
tions.  This  improves  inlining,  by  separating  non-recursive  and  recursive  functions. 

•  dead-code  elimination:  We  eliminate  unreferenced,  pure  expressions,  and  func¬ 
tions. 

•  uncurrying:  We  transform  curried  functions  to  multi-argument  functions  when¬ 
ever  all  of  the  call  sites  of  the  curried  function  can  be  determined. 

•  constant  folding:  We  reduce  arithmetic  operations,  switches,  and  typecases  on 
constant  values,  as  well  as  projections  from  known  records. 

•  sinking:  We  push  pure  expressions  used  in  only  one  branch  of  a  switch  into  that 
branch. 

•  inlining:  We  always  inline  functions  that  are  applied  only  once.  We  never  inline 
recursive  functions.  We  inline  non-recursive,  “small”  functions  in  a  bottom-up  pass. 

•  inlining  switch  continuations:  We  inline  the  continuation  of  a  switch,  when  all 
but  one  clause  raises  an  exception.  For  example,  the  expression 

let  x  =  if  y  then  e2  else  raise  e3 

in  C4 

end 

is  transformed  to 

if  y  then  let  x  =  e2  in  e4  end  else  raise  e3 . 

This  makes  expressions  in  e2  available  within  e4  for  optimizations  such  as  common 
sub-expression  elimination. 
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•  common  subexpression  elimination  (CSE):  Given  an  expression 

let  x  =  ft] 
in  e2 
end 

if  e\  is  pure,  then  we  replace  all  occurrences  of  e\  in  e2  with  x.  Pure  expres¬ 
sions  include  operations  such  as  record  projection  that  are  guaranteed  to  terminate 
without  effect,  but  exclude  signed  arithmetic  (due  to  the  possibility  of  overflow  and 
divide- by-zero  exceptions)  and  function  calls. 

•  eliminating  redundant  switches:  Given  an  expression 

let  x  =  if  z  then 

let  val  y  =  if  z  then  ex  else  e2 
in  ... 

we  replace  the  nested  if  statement  by  e1;  since  z  is  always  true  at  that  point. 

•  hoisting  invariant  computations:  Using  the  call  graph,  we  calculate  the  nesting 
depth  of  each  function.  We  assign  a  let-bound  variable  and  the  expression  it  binds 
a  nesting  depth  equal  to  that  of  the  nearest  enclosing  function.  For  every  pure 
expression  e,  if  all  free  variables  of  e  have  a  nesting  depth  less  than  e,  we  move  the 
definition  of  e  right  after  the  definition  of  the  free  variable  with  the  highest  lexical 
nesting  depth. 

•  eliminating  redundant  comparisons:  We  propagate  a  set  of  simple  arithmetic 
relations  of  the  form  x  <  y  top-down  through  the  program  and  a  “rule-of-signs” 
abstract  interpretation  is  used  to  determine  signs  of  variables.  We  use  this  infor¬ 
mation  to  eliminate  array-bounds  checks  and  other  tests. 

In  addition  to  these  standard  optimizations,  I  perform  a  set  of  type-specific  optimizations: 

•  eliminating  peq:  If  the  polymorphic  equality  primitive  is  applied  to  a  known  type, 
and  the  number  of  syntax  nodes  in  the  type  is  smaller  than  some  parameter,  then  I 
generate  special  equality  code  for  that  type.  I  delay  performing  this  specialization 
until  after  hoisting  and  common  subexpression  elimination  to  avoid  duplication. 

•  eliminating  vararg  and  onearg:  As  suggested  in  Section  5.2.5,  the  onearg  and 
vararg  primitives  cancel.  I  use  this  to  eliminate  applications  of  onearg  and  vararg. 
Also,  as  types  become  known,  I  specialize  the  onearg  and  vararg  primitives  to  the 
appropriate  coercion.  As  with  peq,  I  delay  performing  this  specialization  until  after 
hoisting  and  common  subexpression  elimination  to  avoid  duplication. 
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•  hoisting  type  applications:  Because  we  make  the  restriction  at  the  source  level 
that  all  expressions  assigned  a  V-type  must  be  values  (he.,  effect-free),  we  are  as¬ 
sured  that  a  type  application  is  effect-free.  Furthermore,  the  back-end  does  not 
introduce  any  polymorphic  functions  with  computational  effects,  and  thus  all  type 
applications  are  effect  free.  Therefore,  like  other  pure  operations,  we  hoist  type 
applications. 

Currently,  we  apply  the  optimization  as  follows.  First,  we  perform  a  round  of  reduc¬ 
tion  optimizations,  including  dead-code  elimination,  constant  folding,  inlining  functions 
called  once,  CSE,  eliminating  redundant  switches,  and  invariant  removal.  These  opti¬ 
mizations  do  not  increase  program  size  and  should  always  result  in  better  code.  We 
iterate  these  optimizations  until  no  further  reductions  occur.  Then  we  perform  switch- 
continuation  inlining,  sinking,  uncurrying,  comparison  elimination,  fix  minimizing,  and 
general  inlining.  The  entire  optimization  process  is  then  iterated  for  some  adjustable 
number  of  times  (currently  three). 


8.7  Closure  Conversion 

The  closure  conversion  phase  of  TIL  is  based  on  the  formal  treatment  of  closure  conversion 
given  in  Chapter  6,  but  following  Kranz  [78]  and  Appel  [9],  I  extended  the  translation  to 
avoid  creating  closures  and  environments  unless  functions  “escape”.  A  function  escapes 
if  it  is  placed  in  a  data,  structure,  passed  as  an  argument  to  another  function,  or  is 
returned  as  the  result  of  a  function.  If  a  function  does  not  escape,  then  all  of  its  call  sites 
can  be  determined  and  all  of  the  free  variables  of  the  function  are  available  at  the  call 
sites.  Therefore,  we  transform  non-escaping  functions  to  code  that  takes  all  of  their  free 
variables  as  additional  arguments,  but  we  avoid  creating  an  environment  and  closure  for 
the  function.  Instead,  we  modify  the  call  sites  of  each  function  to  pass  these  extra  values 
directly  to  the  code. 

A  transformed  call  site  may  mention  variables  that  occur  free  in  the  function  being 
called,  but  not  the  original  calling  function.  Therefore,  we  must  take  the  set  of  appli¬ 
cations  to  non-escaping  functions  into  account  when  calculating  the  free  variables  of  a 
function. 

We  use  a  flat  constructor  tuple  to  represent  constructor  environments,  and  a  flat 
record  to  represent  value  environments.  These  environments  are  always  allocated  on  the 
heap.  To  support  recursion,  we  simultaneously  define  the  environments  and  closures  of 
a  set  of  mutually  recursive  function  using  a  Scheme-style  “letrec”  declaration. 

TIL  does  not  close  over  variables  bound  at  the  top  level  (i.e. ,  outside  of  any  function). 
Such  variables  are  mapped  to  labels  (machine  addresses)  by  lower  levels  of  the  compiler 
and  thus  can  be  directly  addressed.  In  practice,  this  results  in  a  two-level  environment, 
where  a  data  pointer  is  used  to  access  top-level  values  and  a  closure  pointer  is  used  to 
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access  values  defined  within  a  function.  The  advantage  of  this  approach  is  that  heap- 
allocated  environments  can  be  substantially  smaller.  The  disadvantage  is  that  values 
bound  at  the  top-level  cannot  easily  be  garbage  collected,  since  these  values  are  bound 
to  labels. 

After  closure  conversion,  we  perform  another  round  of  optimization  in  an  effort  to 
clean  up  any  inefficiencies  introduced  by  closure  conversion.  Some  optimizations,  notably 
invariant  removal  and  inlining,  are  turned  off  since  they  do  not  preserve  the  invariants  of 
closure  conversion. 


8.8  Ubform,  Rtl,  and  Alpha 

After  closure  conversion  and  closure  optimization,  we  translate  the  resulting  code  to 
Ubform.  The  Ubform  intermediate  language  is  an  untyped  Bform,  but  each  variable  is 
labelled  with  representation  information.  We  erase  the  distinction  between  computations 
at  the  constructor  and  term  levels  in  the  translation  to  Ubform.  Hence,  constructor  vari¬ 
ables  become  term  variables,  constructor  values  become  term  values,  and  constructor 
computations  become  term  computations.  Mono_k  constructors,  such  as  Int_c,  are  rep¬ 
resented  as  an  enumeration,  whereas  Mono_k  constructors  that  take  an  argument,  such 
as  Record_c  are  represented  as  tagged,  variant  records.  Thus,  the  entire  kind  of  Mono_k 
constructors  is  represented  in  the  same  fashion  as  an  SML  datatype. 

The  representation  information  on  Ubform  variables  indicates  whether  each  variable 
is  an  integer,  float,  pointer  to  a  heap-allocated  value,  or  of  unknown  representation 
at  compile-time.  Enum_c,  Enumorrec_c,  and  Emrniorsum_c  values  are  considered  to  be 
pointers  since  the  garbage  collector  can  always  determine  whether  they  are  in  fact  pointers 
at  run  time.  Variables  with  unknown  representation  are  annotated  with  other  variables 
(corresponding  to  Bform  type  variables)  that  will  contain  the  representation  at  run¬ 
time.  An  earlier  stage  boxes  floating  point  values,  thereby  guaranteeing  that  variables  of 
unknown  representation  are  never  floats.  This  invariant  allows  the  register  allocator  to 
always  assign  a  general  purpose  register  to  variables  of  unknown  representation.  Without 
the  invariant,  the  register  allocator  would  have  to  assign  both  a  general  purpose  machine 
register  and  a  floating  point  register  to  the  variable  and  use  dynamic  type  analysis  to 
decide  which  of  the  two  registers  to  use. 

The  Ubform  representation  is  quite  similar  to  a  direct-style  version  of  the  CPS  in¬ 
termediate  form  used  by  Shao  and  Appel  in  the  SML/NJ  compiler  [110].  While  Shao 
and  Appel  claim  that  their  compiler  is  type-based,  they  only  use  representation-based, 
untyped  intermediate  forms.  Hence,  it  is  not  possible,  in  general,  to  verify  automatically 
that  any  of  their  intermediate  representations  are  type-safe.  In  contrast,  only  the  last 
stages  of  TIL  are  untyped  and  a  type  checker  can  be  used  to  verify  automatically  the 
type  integrity  of  the  code,  even  after  optimization  and  closure  conversion.  Furthermore, 
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the  representation  information  we  use  at  the  Ubform  level  is  more  general  than  the  rep¬ 
resentation  information  used  by  Shao  and  Appel,  since  we  allow  dynamic  instantiation 
of  representation  information. 

Currently,  no  optimization  or  other  transformations  occur  at  the  Ubform  level.  We 
simply  use  the  translation  to  Ubform  as  a  convenient  way  to  stage  the  compilation  of  the 
closure-converted  code  to  the  next  intermediate  form.  This  next  form  is  called  Rtl,  which 
stands  for  Register  Transfer  Language.  Rtl  is  similar  to  Alpha,  MIPS,  and  other  RISC- 
style  assembly  languages.  However,  it  provides  heavy-weight  function  call  and  return 
mechanisms,  and  a  form  of  interprocedural  goto  for  implementing  exceptions.  Rtl  also 
provides  an  infinite  number  of  pseudo-registers.  In  the  conversion  from  Ubform  to  Rtl, 
we  decide  whether  Ubform  variables  will  be  represented  as  constants,  labels,  or  pseudo¬ 
registers.  During  the  conversion,  we  also  eliminate  exceptions,  insert  tagging  operations 
for  records  and  arrays,  and  insert  garbage  collection  checks. 

The  Rtl  level  would  be  suitable  for  a  conventional,  low-level  imperative  optimizer, 
similar  to  the  ones  found  in  C  and  Fortran  compilers.  We  perform  a  few  small  optimiza¬ 
tions,  notably  collapsing  garbage  collection  checks  and  eliminating  redundant  loads  of 
small  constants. 

Finally,  the  Rtl  representation  is  translated  to  Alpha,  Alpha  is  DEC  Alpha  assembly 
language,  with  extensions  similar  to  those  for  Rtl.  In  the  translation  from  Rtl  to  Alpha, 
we  use  conventional  graph-coloring  register  allocation  to  allocate  physical  registers  for 
the  Rtl  pseudo-registers.  We  also  construct  tables  describing  the  layout  and  garbage 
collection  information  for  each  stack  frame. 


8.9  Garbage  Collection 

The  translation  from  Ubform  to  Rtl  and  the  translation  from  Rtl  to  Alpha,  maintain 
the  representation  information  that  annotates  variables.  This  representation  information 
is  used  to  construct  tables  for  garbage  collection.  These  tables  tell  the  collector  which 
registers  and  which  stack  slots  contain  pointers  to  heap-allocated  objects. 

Abstractly,  we  record  enough  information  to  determine  which  registers  and  which 
stack  slots  are  live  at  every  call  site,  and  whether  or  not  to  trace  these  values,  based  on 
the  representation  information.  We  use  the  return  address  of  call  sites  as  an  index  to 
find  the  information  and  ensure  that  the  return  address  is  always  saved  in  the  first  slot 
of  a  stack  frame.  In  these  respects,  our  collector  closely  resembles  Britton’s  collector  for 
Pascal  [23]  and  the  formal  development  of  Chapter  7. 

However,  our  collector  is  complicated  by  two  details:  the  first  complication  is  that 
some  values  have  unknown  representation  at  compile  time.  At  the  Ubform  level,  these 
values  are  labelled  with  another  variable  (corresponding  to  a  type  variable)  that,  at  run 
time,  indicates  the  representation  of  the  value.  Hence,  for  values  of  unknown  represen- 
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tation,  we  must  record  where  this  other  variable  can  be  found  so  that  the  collector  can 
determine  whether  or  not  to  trace  the  original  value.  In  this  respect,  our  collector  resem¬ 
bles  Tolmach’s  tag-free  collector  for  SML  [119].  However,  Tolmach  calculates  unknown 
representations  lazily  during  garbage  collection,  because  he  does  not  have  a  general  pro¬ 
gramming  language  at  the  type  level.  In  particular,  he  only  supports  substitution  at  the 
constructor  level  and  not,  for  instance,  Typerec.  In  contrast,  our  constructor  computa¬ 
tions  can  perform  type  analysis,  function  call,  and  allocation.  Therefore,  we  calculate 
unknown  representations  eagerly,  during  program  evaluation  so  that  all  representations 
are  already  calculated  before  garbage  collection  is  invoked. 

The  second  complication  is  that  we  split  the  registers  into  a  set  of  caller-saves  and 
a  set  of  callee-saves  registers.  Callee-saves  registers  are  used  to  hold  values  needed  after 
a  procedure  call.  To  use  a  callee-saves  register  as  a  temporary,  a  procedure  must  save 
the  contents  of  the  register  on  the  stack  and  restore  the  contents  before  returning  to  the 
caller. 

In  effect,  callee-saves  registers  are  like  extra  arguments  to  a  procedure  that  are  sim¬ 
ply  returned  with  the  result  of  the  procedure.  Unfortunately,  the  types  and  thus  the 
representations  of  these  extra  arguments  are  unknown  to  the  called  procedure.  We  solve 
this  issue  by  recording  when  callee-saves  registers  are  saved  into  stack  slots  and  when 
variables  are  placed  into  callee-saves  registers.  During  garbage  collection,  we  process  the 
stack  from  oldest  frame  to  youngest  frame.  Initially,  the  callee-saves  registers  are  not 
live.  If  the  first  procedure  places  values  into  the  callee-saves  registers,  then  it  knows  the 
representations  of  these  values.  We  propagate  this  information  to  the  next  stack  frame. 
If  the  next  procedure  spills  a  callee-saves  register  to  the  stack,  then  we  can  determine  the 
representation  of  the  stack  slot  from  the  propagated  representation  information.  Oth¬ 
erwise,  we  simply  forward  the  representation  information  to  the  next  stack  frame,  and 
so  on.  This  approach  to  reconstructing  type  information  is  similar  to  the  approach  sug¬ 
gested  by  Appel  [8]  and  Goldberg  and  Gloger  [49,  50].  Once  we  determine  which  registers 
and  which  stack  slots  must  be  traced,  we  perform  a  standard  copying  garbage  collection 
on  the  resulting  roots.  Currently,  we  use  a  simple  two-generation  collector. 


8.10  Performance  Analysis  of  TIL 

In  this  section,  I  compare  the  performance  of  code  produced  by  TIL  against  code  produced 
by  the  SML/NJ  compiler  [12],  I  also  examine  other  aspects,  including  heap  allocation, 
physical  memory  requirements,  executable  size,  and  compile  time.  The  goal  is  to  show 
that,  for  a  reasonable  set  of  benchmarks,  TIL  produces  code  that  is  comparable  (or 
better)  than  the  code  produced  by  SML/NJ,  at  least  for  the  subset  of  SML  that  TIL 
currently  supports. 

However,  I  make  no  attempt  to  compare  TIL  and  SML/NJ  except  for  these  end-to- 
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end  measurements.  There  are  many  differences  between  these  two  systems,  so  we  cannot 
directly  compare  particular  implementation  choices  (such  as  whether  or  not  to  use  tag- 
free  collection),  simply  because  we  cannot  fix  all  of  the  other  variables.  By  showing  that 
TIL  code  is  comparable  to  SML/NJ,  I  hope  to  persuade  the  reader  that  a  type-based 
implementation  of  SML  that  uses  novel  technologies,  such  as  dynamic  type  dispatch  and 
tag-free  collection,  can  compete  with  one  of  the  best  existing  ML  compilers. 

8.10.1  The  Benchmarks 

I  chose  a  set  of  small  to  medium-sized  benchmarks  ranging  from  a  few  lines  up  to  2000 
lines  of  code  to  measure  the  performance  of  TIL.  Larger  programs  would  be  desirable,  but 
there  are  few  large  SML  programs  that  do  not  use  nested  modules  or  functors.  Table  8.1 
describes  these  programs.  The  benchmarks  cover  a  range  of  application  areas  including 
scientific  computing,  list  processing,  systems  programming,  and  compilers.  Some  of  these 
programs  have  been  used  previously  for  measuring  ML  performance  [9,  36].  Others  were 
adapted  from  the  Caml-Light  distribution  [24], 

For  this  set  of  comparisons,  I  compiled  all  of  the  programs  as  single  closed  modules. 
For  lexgen  and  simple,  which  are  standard  benchmarks  [9],  I  eliminated  functors  by 
hand,  since  TIL  does  not  yet  support  functors. 

For  TIL,  I  compiled  programs  with  all  optimizations  enabled.  For  SML/NJ,  I  com¬ 
piled  programs  using  the  default  optimization  settings.  I  used  a  recent  internal  release 
of  SML/NJ  (a  variant  of  version  108.3),  since  it  produces  code  that  is  about  35%  faster 
than  the  standard  0.93  release  of  SML/NJ  [110]. 

For  both  compilers,  we  extended  the  built-in  types  with  safe  2-dimensional  arrays. 
The  2-d  array  operations  perform  bounds  checking  on  each  dimension  and  then  use  unsafe 
1-d  array  operations.  Arrays  are  stored  in  column-major  order. 

TIL  automatically  prefixes  a  set  of  operations  onto  each  module  that  it  compiles.  This 
“inline”  prelude  is  about  280  lines  in  length.  It  contains  2-d  array  operations,  commonly- 
used  list  functions,  and  so  forth.  By  prefixing  the  module  with  these  definitions,  we 
ensure  that  they  are  exposed  to  the  optimizer.  To  avoid  handicapping  SML/NJ,  I  created 
separate  copies  of  the  benchmark  programs  for  SML/NJ,  and  carefully  placed  equivalent 
“prelude”  code  at  the  beginning  of  each  program. 

Since  TIL  creates  stand-alone  executables,  I  used  the  exportFn  facility  of  SML/NJ 
to  create  stand-alone  programs.  The  exportFn  function  of  SML/NJ  dumps  part  of  the 
heap  to  disk  and  throws  away  the  interactive  system. 

8.10.2  Comparison  against  SML/NJ 

I  compared  the  performance  of  TIL  against  SML/NJ  in  several  dimensions:  execution 
time,  total  heap  allocation,  physical  memory  footprint,  the  size  of  the  executable,  and 
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Program 

lines 

Description 

cksum 

241 

Checksum  fragment  from  the  Foxnet  [18],  doing  5000  checksums 
on  a  4096-byte  array,  using  a  stream  interface  [17]. 

diet 

166 

Insert  10,000  strings,  indexed  by  integers  into  a  balanced  bi¬ 
nary  tree,  lookup  each  string  and  replace  it  with  another.  The 
balanced  binary  trees  are  taken  from  the  SML/NJ  library. 

fft 

246 

Fast-Fourier  transform. 

fmult 

63 

Matrix  multiply  of  two  100x100  floating  point  matrices. 

imult 

63 

Matrix  multiply  of  two  200x200  integer  matrices. 

kb 

618 

The  Ivnuth-Bendix  completion  algorithm. 

lexgen 

1123 

A  lexical-analyzer  generator  [13],  processing  the  lexical  descrip¬ 
tion  of  SML/NJ. 

life 

146 

A  simulation  of  cells  implemented  using  lists  [103]. 

logic 

459 

A  simple  Prolog-like  interpreter,  with  unification  and  backtrack¬ 
ing. 

msort 

45 

List  merge  sort  of  5,120  integers,  40  times. 

pia 

2065 

A  Perspective  Inversion  Algorithm  [125]  deciding  the  location  of 
an  object  in  a  perspective  video  image. 

qsort 

141 

Integer  array  quicksort  of  50,000  pseudo-random  integers,  2 
times. 

sieve 

27 

Sieve  of  Eratosthenes,  filtering  primes  up  to  30000. 

simple 

870 

A  spherical  fluid-dynamics  program  [39],  run  for  4  iterations 
with  grid  size  of  100. 

soli 

131 

A  solver  for  a  peg-board  game. 

Table  8.1:  Benchmark  Programs 
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Figure  8.4:  TIL  Execution  Time  Relative  to  SML/NJ 


compilation  time. 

I  measured  execution  time  on  a  DEC  Alpha  AXP  250-4/266  workstation,  running 
OSF/1,  version  V3.2,  using  the  UNIX  getrusage  function.  For  SML/NJ,  I  started 
timing  after  the  heap  had  been  reloaded.  For  TIL,  I  measured  the  entire  execution  time 
of  the  process,  including  load-time.  I  made  5  runs  of  each  program  on  an  unloaded 
workstation  and  chose  the  lowest  execution  time.  The  workstation  had  96  Mbytes  of 
physical  memory,  so  paging  was  not  a  factor  in  the  measurements. 

I  measured  total  heap  allocation  by  instrumenting  the  TIL  runtime  to  count  the  bytes 
allocated.  I  used  existing  instrumentation  in  the  SML/NJ  run-time  system.  I  measured 
the  maximum  amount  of  physical  memory  during  execution  using  getrusage. 

To  compare  program  sizes,  I  first  compiled  empty  programs  under  TIL  and  under 
SML/NJ.  The  empty  program  for  TIL  generates  a  stripped  executable  that  is  around 
250  Kbytes,  whereas  the  empty  program  for  SML/NJ  consists  of  roughly  425  Kbytes 
from  the  runtime,  and  170  Kbytes  from  the  heap,  for  a  total  of  595  Kbytes.  Next,  I 
stripped  all  executables  produced  by  TIL,  and  then  subtracted  the  size  of  the  empty 
program  (250  Kbytes)  from  the  size  of  each  program.  For  SML/NJ,  I  measured  the  size 
of  the  heap  generated  by  exportFn  for  each  program  and  subtracted  the  size  of  the  heap 
generated  by  the  empty  program  (170  Kbytes). 

Finally,  I  measured  end-to-end  compilation  time,  including  time  to  assemble  files 
produced  by  TIL  and  time  to  export  a  heap  image  for  SML/NJ. 

Figures  8.4  through  8.8  present  the  measurements.  The  raw  numbers  appear  in  Tables 
8.2  through  8.6.  For  each  benchmark,  measurements  for  TIL  were  normalized  to  those 
for  SML/NJ  and  then  graphed.  SML/NJ  represents  the  100%  mark  on  all  the  graphs, 
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cksum  diet  fft  kb  lexgen  life  logic  msort  pia  qsort  sieve  simple  soli 


Figure  8.5:  TIL  Heap  Allocation  Relative  to  SML/NJ  (excluding  fmult  and  imult) 


Figure  8.6:  TIL  Physical  Memory  Used  Relative  to  SML/NJ 
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Figure  8.7:  TIL  Executable  Size  Relative  to  SML/NJ  (without  runtimes) 


Figure  8.8:  Til  Compilation  Time  Relative  to  SML/NJ 
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indicated  by  a  solid  horizontal  line. 

Figure  8.4  presents  relative  running  times.  On  average,  programs  compiled  by  TIL 
run  about  two  times  faster  than  programs  compiled  by  SML/NJ.  All  programs  but  kb 
and  msort  run  faster  under  TIL  than  under  SML/NJ.  Furthermore,  the  TIL  versions 
of  the  largest  programs,  lexgen,  pia,  and  simple,  are  more  than  twice  as  fast  as  their 
SML/NJ  counterparts.  Finally,  the  slowest  program,  msort,  is  no  more  than  50%  slower 
than  the  SML/NJ  version. 

The  kb  benchmark  uses  exceptions  and  exception  handlers  quite  frequently.  TIL  does 
a  relatively  poor  job  of  register  saving  and  restoring  around  exception  handlers  and  I 
suspect  that  this  is  the  reason  for  its  poor  performance  on  this  benchmark.  The  poor 
performance  of  msort  is  most  likely  due  to  the  difference  in  garbage  collectors  for  the  two 
systems,  since  roughly  two-thirds  of  the  running  time  for  TIL  is  spent  in  the  collector.  I 
speculate  that  the  multi-generational  collector  of  SML/NJ  does  a  better  job  of  memory 
management  for  this  benchmark. 

Since  SML/NJ  flattens  arguments  using  Leroy-style  coercions,  and  also  flattens  some 
constructors  (see  Section  8.5.3),  the  primary  difference  in  performance  for  most  bench¬ 
marks  is  not  due  to  my  type-directed  translations.  Most  likely,  the  primary  difference  in 
performance  is  due  to  the  conventional  optimizations  that  TIL  employs  [115].  What  is 
remarkable  is  that,  even  though  TIL  employs  more  optimizations  than  SML/NJ,  the  use 
of  types  and  dynamic  type  dispatch  does  not  interfere  with  optimization.  Furthermore, 
for  some  benchmarks  (notably  fft  and  simple)  much,  if  not  all,  of  the  performance  im¬ 
provement  is  due  to  the  type-directed  array  flattening  (see  Section  8.10.4).  Regardless,  a 
reasonable  conclusion  to  draw  from  these  measurements  is  that  type-directed  compilation 
and  dynamic  type  dispatch  does  not  interfere  with  optimization  and,  when  coupled  with 
a  good  optimizer,  yields  code  that  competes  quite  well  with  existing  compilers. 

Figure  8.5  compares  the  relative  amounts  of  heap  allocation  between  TIL  and 
SML/NJ,  except  for  the  fmult  and  imult  benchmarks.  The  TIL  version  of  fmult  al¬ 
locates  over  16  Mbytes  of  data,  whereas  the  SML/NJ  version  allocates  less  around  1 
Kbyte.  This  is  entirely  because  TIL  does  not  flatten  floating  point  values  into  registers 
across  function  calls.  During  the  dot  product  loop  of  the  TIL  version,  floating  point 
values  are  pulled  out  of  the  arrays,  multiplied,  and  added  to  an  accumulator,  and  the 
accumulator  is  boxed  as  it  is  passed  around  the  loop.  Under  SML/NJ,  the  accumulator 
remains  unboxed.  In  contrast,  the  TIL  version  of  imult  does  not  allocate  at  all  at  run 
time,  whereas  the  SML/NJ  version  allocates  about  1  Kbyte.  Even  including  fmult  but 
excluding  imult,  the  geometric  mean  of  the  ratios  of  heap-allocated  data  shows  that  TIL 
programs  allocate  about  34%  of  the  amount  of  data  that  SML/NJ  allocates.  This  low 
percentage  is  not  surprising,  because  TIL  uses  a  stack  for  activation  records,  whereas 
SML/NJ  allocates  activation  records  on  the  heap. 

Figure  8.6  presents  the  relative  maximum  amounts  of  physical  memory  used.  TIL 
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programs  tend  to  use  either  much  less  or  much  more  memory  than  SML/NJ  programs. 
I  speculate  that  some  variability  is  due  to  the  different  strategies  used  to  size  the  heaps. 
SML/NJ  uses  a  multi-generational  collector  with  a  heap-size-to-live-data  ratio  of  3  to  1 
for  older  generations,  whereas  TIL  uses  a  two-generation  collector  that  has  a  heap-size- 
to-live-data  ratio  of  up  to  10  to  1  (the  ratio  varies).  Also,  since  TIL  does  not  yet  properly 
implement  tail-recursion  (tail  calls  within  exception  handlers  are  not  implemented  prop¬ 
erly)  the  stack  may  be  larger  than  it  needs  to  be. 

Figure  8.7  compares  executable  sizes,  excluding  runtimes  and  any  pervasives.  On 
average,  TIL  programs  are  about  80%  of  the  size  of  SML/NJ  programs,  and  no  program 
is  more  than  twice  as  big  as  the  SML/NJ  version.  These  sizes  confirm  that  generating 
tables  for  nearly  tag-free  garbage  collection  consumes  a  modest  amount  of  space.  The 
numbers  also  establish  that  the  inlining  strategy  used  by  TIL  produces  code  of  reasonable 
size. 

Figure  8.8  compares  compilation  times  for  TIL  and  SML/NJ.  SML/NJ  does  quite 
a  lot  better  than  TIL  when  it  comes  to  compilation  time,  compiling  about  six  times 
faster.  However,  we  have  yet  to  tune  TIL  for  compilation  speed.  Most  of  the  compile 
time  is  spent  in  the  optimizer  and  the  register  allocator.  We  assume  that  much  of  the 
time  in  the  register  allocator  can  be  eliminated  by  using  an  intelligent  form  of  coalescing 
as  suggested  by  George  and  Appel  [45].  We  assume  that  much  of  the  time  spent  in  the 
optimizer  can  be  eliminated  by  simply  tuning  and  inlining  key  routines. 

Another  reason  the  optimizer  is  slow  is  that  we  always  fully  normalize  a  type  when¬ 
ever  we  want  to  determine  some  property  (e.g.,  the  domain  or  range  of  an  arrow  type). 
Normalizing  is  an  expensive  process  that  destroys  a  great  deal  of  sharing.  By  lazily 
normalizing,  we  hope  to  improve  many  optimization  phases  that  depend  upon  type  in¬ 
formation. 

Finally,  I  speculate  that  a  great  deal  of  time  and  allocation  during  compilation  is 
due  to  our  naive  approach  of  maintaining  type  information.  In  particular,  we  label  each 
bound  value  variable  with  its  type,  and  we  label  each  bound  type  variable  with  its  kind. 
In  the  B-form  representation,  this  means  that  almost  every  construct  has  associated  type 
information  and  this  type  information  contains  a  great  deal  of  kind  information.  Much 
of  the  type/kind  information  is  unneeded  or  can  easily  be  recovered.  Many  primitive 
transformations,  such  as  n-conversion,  must  process  this  unneeded  information  and  are 
thus  slowed  by  the  inefficient  representation. 


8.10.3  The  Effect  of  Separate  Compilation 

In  this  section,  I  explore  the  effect  that  separate  compilation  has  on  the  performance 
of  some  of  the  benchmarks.  When  programs  are  separately  compiled,  the  optimizer 
cannot  perform  as  many  reductions  and  transformations.  Hence,  the  likelihood  that 
the  resulting  program  will  use  dynamic  dispatch  increases  when  compared  to  the  same 
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program  compiled  all  together. 

I  took  two  of  the  larger  programs,  logic  and  lexgen,  and  broke  them  into  modules  at 
natural  boundaries,  resulting  in  the  benchmark  programs  logic_s  and  lexgen_s.  These 
benchmarks  should  give  an  indication  of  how  TIL  performs  with  realistic,  separately 
compiled  programs. 

I  also  took  the  diet,  fmult,  imult,  and  msort  benchmarks,  placed  the  core  routines 
into  separate  modules,  and  abstracted  the  types  and  primitive  operations  of  the  routines. 
Thus,  these  modules  provide  a  set  of  generic  library  routines  and  abstract  datatypes 
(balanced  binary  trees,  matrix  multiply,  list  sort)  that  can  be  used  at  any  type.  During 
development,  programmers  are  likely  to  use  such  modules.  I  wanted  to  determine  what 
the  costs  are  of  holding  the  types  abstract  and  separately  compiling  the  modules  from 
their  uses. 

Table  8.7  describes  the  resulting  benchmarks.  The  running  times  of  these  benchmarks 
relative  to  SML/NJ  are  graphed  in  Figure  8.9,  and  the  raw  numbers  are  given  in  Table 
8.8.  On  the  whole,  the  TIL  programs  run  roughly  as  well  or  better  than  their  SML/NJ 
counterparts.  Only  logic_s,  msortO  and  msortl  are  slower,  and  by  no  more  than  20%. 

Figure  8.10  compares  the  running  times  of  each  of  the  separately  compiled  programs  to 
the  comparable,  globally  compiled  benchmark  of  the  previous  section.  For  the  non-matrix 
benchmarks,  we  see  about  a  10-20%  overhead  in  separate  compilation.  For  the  matrix 
benchmarks,  we  see  over  a  350%  overhead.  The  difference  between  the  fmultO/imultO 
and  fmult  1/imultl  bars  is  because  fmultO  and  imultO  must  perform  dynamic  type 
dispatch  to  select  an  array  operation,  whereas  fmult  1  and  imult  1  do  not.  Hence,  we 
see  that  most  of  the  overhead  of  separate  compilation  is  not  due  to  type  abstraction,  but 
rather  to  the  fact  that  the  primitive  operations  (multiplication  and  addition)  are  held 
abstract. 

These  figures  indicate  that  TIL  provides  a  tradeoff  between  separate  compilation 
and  performance.  During  development,  programmers  can  use  separate  compilation  and 
expect  that  their  code  will  perform  reasonably  well.  Towards  the  end  of  development, 
as  key  routines  are  identified  through  profiling,  programmers  can  specialize  the  types  of 
generic  routines  and  expect  a  modest  gain  in  performance,  without  sacrificing  full  separate 
compilation.  Clients  of  a  specialized  generic  abstraction  need  only  be  re-compiled  if  the 
type  exported  by  that  abstraction  changes.  At  the  very  end  of  development,  when  the 
most  important  abstractions  and  routines  are  identified,  programmers  can  inline  these 
modules  to  get  the  best  performance,  but  at  the  cost  of  separate  compilation. 

8.10.4  The  Effect  of  Flattening 

In  this  section,  I  explore  the  performance  effect  of  the  various  type-directed  flattening 
translations  in  TIL.  Of  course,  we  cannot  easily  determine  the  entire  impact  of  types  on 
the  system.  For  instance,  it  is  impossible  to  determine  what  effect  the  tag-free  garbage 
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Figure  8.9:  TIL  Execution  Time  Relative  to  SML/NJ  for  Separately  Compiled  Programs 


Figure  8.10:  Execution  Time  of  Separately  Compiled  Programs  Relative  to  Globally 
Compiled  Programs 
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collector  has  without  building  a  corresponding  tagging  collector.  Therefore,  I  have  only 
examined  those  uses  of  types  that  I  can  easily  turn  off  and  on. 

For  each  of  the  benchmarks  described  in  the  previous  section,  I  measured  the  running 
time  and  amount  of  heap  allocation  of  the  program  when  compiled  in  the  following  ways: 

•  B  (Baseline):  We  do  not  flatten  arguments,  constructors,  or  arrays. 

•  FC  (Flattened  Constructors)  :  We  flatten  all  Enumorrec_c  constructors  that  con¬ 
tain  records.  This  effectively  flattens  lists  and  option  datatypes.  Dynamic  type 
dispatch  is  used  when  the  component  type  is  unknown. 

•  CAF  (Conventional  Argument  Flattening):  In  addition  to  FC,  We  examine  the 
call  sites  of  each  non-escaping  function.  If  each  call-site  applies  the  function  to  a 
known  record,  then  we  flatten  the  function  and  pass  the  components  of  the  record 
directly  as  arguments. 

•  TDAF  (Type-Directed  Argument  Flattening):  In  addition  to  FC,  we  flatten  all 
functions  that  take  records  as  arguments,  and  flatten  all  applications  of  functions 
to  records.  We  use  dynamic  type  dispatch  when  the  argument  type  is  unknown. 

•  FRA  (Flattened  Real- Arrays):  In  addition  to  TDAF,  we  flatten  all  polymorphic 
arrays  of  floating  point  values.  We  use  dynamic  type  dispatch  for  array  operations, 
when  the  component  type  is  unknown. 

Tables  8.9  and  8.10  record  the  running  times  (in  seconds)  and  amounts  of  heap  allocation 
(in  megabytes)  for  each  program  compiled  in  each  configuration.  The  numbers  in  paren¬ 
thesis  indicate  the  ratio  to  the  corresponding  baseline.  Figure  8.10.4  plots  the  running 
times,  normalized  to  the  baseline;  Figure  8.10.4  plots  the  allocation,  normalized  to  the 
baseline. 

From  the  data,  we  can  conclude  that  flattening  both  constructors  and  arguments  is 
almost  always  worthwhile,  both  in  terms  of  running  times  and  allocation.  All  together, 
the  flattening  phases  provide  an  average  speedup  of  42%  and  decrease  allocation  by  50%. 
The  biggest  improvements  for  most  benchmarks  comes  from  argument  flattening.  Fur¬ 
thermore,  type-directed  argument  flattening  does  as  well  if  not  better  than  conventional 
argument  flattening  in  almost  all  cases,  providing  an  addition  speedup  of  7%  and  an 
additional  decrease  in  allocation  of  9%,  on  average.  This  is  in  part  because  type-directed 
flattening  is  able  to  flatten  higher-order  functions,  whereas  conventional  argument  flat¬ 
tening  cannot. 

The  increase  in  running  time  for  imult,  when  constructors  are  flattened,  appears  to 
be  an  anomaly  in  the  measurements.  Separate  and  longer  runs  (10  times  each)  indicate 
that  constructor  flattening  has  no  measurable  effect  at  all  on  running  times  or  allocation, 
but  the  original  data  shows  a  19%  increase  in  running  times. 
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Flattening  real  arrays  has  mixed  results  on  most  benchmarks,  causing  allocation  or 
running  times  to  vary  up  or  down  slightly.  This  is  not  surprising  since  most  of  the 
benchmarks  do  not  manipulate  floating  point  arrays;  when  floating  point  arrays  are 
flattened,  these  benchmarks  must  perform  dynamic  type  dispatch  when  working  with 
other  array  types.  However,  some  benchmarks  that  do  manipulate  floating  point  arrays, 
notably  fft  and  simple,  show  dramatic  speedups:  fft  shows  an  84%  improvement  in 
running  time  and  an  88%  reduction  in  allocation.  Surprisingly,  the  amount  of  allocation 
for  simple  increases,  but  running  times  decrease.  I  speculate  that,  since  we  box  floating 
point  values  passed  to  other  functions,  this  accounts  for  the  increased  allocation,  since 
values  pulled  out  of  a  flattened  array  must  be  boxed  before  being  passed  to  a  function.  A 
similar  effect  happens  for  fmult.  Furthermore,  these  boxes  are  short-lived  —  lasting  only 
a  function  call  —  and  are  thus  not  preserved  by  the  garbage  collector.  In  contrast,  when 
values  are  boxed  before  being  placed  in  an  array,  the  boxes  may  tend  to  live  longer.  Also, 
as  boxed  arrays  are  updated,  the  generational  collector  must  be  informed  of  any  potential 
generational  conflicts.  This  may  account  for  the  fact  that  simple  allocates  more,  but  runs 
faster  when  floating  point  arrays  are  flattened.  Regardless,  since  flattening  real  arrays 
has  a  negligible  negative  effect  on  the  other  benchmarks,  it  is  a  worthwhile  optimization. 

All  of  these  results  are  consistent  with  results  seen  by  Shao  and  Appel  [110].  The 
advantages  of  my  approach  are  that  (a)  we  can  flatten  data  constructors  without  making 
restrictions  at  the  source  level,  (b)  we  can  flatten  arrays,  (c)  we  need  not  tag  values  for 
garbage  collection  or  polymorphic  equality. 
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Program 

Exec. 

time  (s) 

TIL/NJ 

TIL 

NJ 

cksum 

2.36 

11.68 

0.20 

diet 

0.40 

0.57 

0.70 

fft 

1.39 

15.67 

0.09 

fmult 

0.33 

0.50 

0.66 

imult 

1.46 

4.93 

0.30 

kb 

1.93 

1.74 

1.11 

lexgen 

0.65 

2.76 

0.24 

life 

1.29 

1.44 

0.90 

logic 

7.98 

9.42 

0.85 

msort 

2.72 

1.82 

1.49 

pia 

0.38 

1.11 

0.34 

qsort 

0.44 

1.31 

0.34 

sieve 

0.39 

0.30 

1.30 

simple 

8.51 

24.02 

0.35 

soli 

0.31 

0.56 

0.55 

Geo.  mean 

0.50 

Table  8.2:  Comparison  of  TIL  Running  Times  to  SML/NJ 
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Program 

Heap  alloc.  (Kbytes) 

TIL/NJ 

TIL 

NJ 

cksum 

143.897 

984.775 

0.15 

diet 

12.445 

38.495 

0.32 

fft 

9.108 

214.853 

0.04 

fimilt 

16.000 

0.001 

16,000.00 

imult 

0.000 

0.001 

0.00 

kb 

36.941 

96.761 

0.38 

lexgen 

8.753 

113.405 

0.08 

life 

25.447 

45.259 

0.56 

logic 

253.053 

525.997 

0.48 

msort 

114.052 

121.738 

0.94 

pia 

5.238 

55.142 

0.09 

qsort 

1.035 

35.332 

0.03 

sieve 

2.525 

7.282 

0.35 

simple 

323.394 

826.504 

0.39 

soli 

0.328 

15.606 

0.02 

Geo.  mean  (excluding  imult) 

0.39 

Table  8.3:  Comparison  of  TIL  Heap  Allocation  to  SML/NJ 
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Program 

Phys.  mem.  (Kbytes) 

TIL /N  J 

TIL 

NJ 

cksum 

672 

1472 

0.46 

diet 

1152 

1872 

0.62 

fft 

2672 

17592 

0.15 

fmult 

816 

936 

0.87 

imult 

504 

1208 

0.42 

kb 

2712 

3480 

0.78 

lexgen 

1672 

2992 

0.56 

life 

816 

1208 

0.68 

logic 

6576 

4096 

1.61 

msort 

10032 

4896 

2.05 

pia 

1376 

1592 

0.86 

qsort 

1096 

1536 

0.71 

sieve 

2256 

2576 

0.88 

simple 

9088 

17784 

0.51 

soli 

1000 

1120 

0.89 

Geo.  mean 

0.69 

Table  8.4:  Comparison  of  TIL  Maximum  Physical  Memory  Used  to  SML/NJ 
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Program 

Exec,  size  (Kbytes) 

TIL /NJ 

TIL 

NJ 

cksnm 

32.768 

73.840 

0.44 

diet 

24.576 

30.720 

0.80 

fft 

40.960 

85.128 

0.48 

fmult 

163.840 

196.632 

0.83 

imult 

327.680 

196.632 

1.67 

kb 

90.112 

74.880 

1.20 

lexgen 

271.336 

153.824 

1.76 

life 

40.960 

20.480 

2.00 

logic 

98.304 

51.272 

1.92 

msort 

8.192 

18.432 

0.44 

pia 

237.568 

149.728 

1.59 

qsort 

16.384 

38.936 

0.42 

sieve 

8.192 

17.408 

0.47 

simple 

188.416 

325.808 

0.58 

soli 

16.384 

58.424 

0.28 

Geo.  mean 

0.81 

Table  8.5:  Comparison  of  TIL  Stand-Alone  Executable  Sizes  to  SML/NJ  (excluding 
runtimes) 
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Program 

Comp,  time  (s) 

TIL/NJ 

TIL 

NJ 

cksum 

12.89 

1.62 

7.96 

diet 

11.44 

1.57 

7.29 

fft 

11.24 

2.13 

5.28 

fmult 

3.88 

0.77 

5.04 

imult 

3.88 

0.78 

4.97 

kb 

59.42 

7.19 

8.26 

lexgen 

262.52 

13.60 

19.3 

life 

21.15 

2.48 

8.53 

logic 

85.40 

7.01 

12.18 

msort 

3.79 

0.73 

5.19 

pia 

205.16 

15.93 

12.89 

qsort 

7.27 

1.25 

5.82 

sieve 

2.52 

0.57 

4.42 

simple 

206.46 

18.27 

11.30 

soli 

10.52 

1.35 

7.79 

Geo.  mean 

5.8 

Table  8.6:  Comparison  of  TIL  Compilation  Times  to  SML/NJ 
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Program 

Description 

dictO 

Generic  dictionary  structure,  with  key  and  value  types  held  ab¬ 
stract  as  well  as  key  comparison  function.  Instantiated  with 
integer  keys  and  string  values  as  in  the  diet  benchmark. 

dictl 

Same  as  dictO,  but  with  types  known.  Only  the  key  comparison 
is  held  abstract. 

fmultO 

Generic  matrix  multiply  routine,  with  element  type  held  ab¬ 
stract  as  well  as  primitive  multiplication,  addition,  and  zero  val¬ 
ues.  Instantiated  with  floating  point  type  and  values  as  in  the 
fmult  benchmark. 

fmultO 

Same  as  fmultO,  but  with  the  element  type  known  (real).  Only 
the  primitive  multiplication,  addition,  and  zero  values  are  held 
abstract. 

imultO 

Generic  matrix  multiply  routine,  with  element  type  held  ab¬ 
stract  as  well  as  primitive  multiplication,  addition,  and  zero  val¬ 
ues.  Instantiated  with  integer  type  and  values  as  in  the  imult 
benchmark. 

imult 1 

Same  as  imultO,  but  with  the  element  type  known  (int).  Only 
the  primitive  multiplication,  addition,  and  zero  values  are  held 
abstract. 

lexgen_s 

Same  as  lexgen  benchmark,  but  broken  into  separately  com¬ 
piled  modules. 

logic_s 

Same  as  logic  benchmark,  but  broken  into  separately  compiled 
modules. 

msortO 

Generic  list  merge  sort,  with  element  type  held  abstract  as  well 
as  comparison  operator.  Instantiated  with  integer  type  and  com¬ 
parison  as  in  the  msort  benchmark. 

msortl 

Same  as  msortO,  but  with  the  element  type  known  (int).  Only 
the  comparison  operator  is  held  abstract. 

Table  8.7:  Separately  Compiled  Benchmark  Programs 
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Program 

Comp,  time  (s) 

TIL/NJ 

TIL 

NJ 

dictO 

0.47 

0.58 

0.81 

dictl 

0.38 

0.57 

0.67 

fmultO 

1.19 

1.20 

0.99 

f multi 

1.01 

1.15 

0.88 

imultO 

5.19 

7.57 

0.67 

imult 1 

3.58 

7.57 

0.47 

lexgen_s 

0.72 

2.49 

0.29 

logic_s 

9.99 

8.74 

1.14 

msortO 

3.24 

2.73 

1.19 

msortl 

2.93 

2.73 

1.07 

Table  8.8:  Comparison  of  TIL  Execution  Times  Relative  to  SML/NJ  for  Separately 
Compiled  Programs 


Program 

Execution  time  in  seconds  (ratio  to  baseline) 

B 

FC 

CAF 

TDAF 

FRA 

cksum 

3.92 

3.65  (0.93) 

3.15  (0.80) 

2.29  (0.58) 

2.36  (0.60) 

diet 

0.68 

0.64  (0.94) 

0.45  (0.66) 

0.41  (0.60) 

0.40  (0.59) 

fft 

12.26 

11.41  (0.93) 

11.20  (0.91) 

11.60  (0.95) 

1.39  (0.11) 

fmult 

0.47 

0.46  (0.98) 

0.37  (0.79) 

0.35  (0.74) 

0.33  (0.70) 

imult 

2.80 

3.32  (1.19) 

1.54  (0.55) 

1.47  (0.53) 

1.46  (0.52) 

kb 

2.58 

2.49  (0.97) 

2.45  (0.95) 

2.01  (0.78) 

1.93  (0.75) 

lexgen 

1.23 

0.94  (0.76) 

0.80  (0.65) 

0.71  (0.58) 

0.65  (0.53) 

life 

1.77 

1.37  (0.77) 

1.33  (0.75) 

1.28  (0.72) 

1.29  (0.73) 

logic 

10.45 

10.07  (0.96) 

8.87  (0.85) 

7.92  (0.76) 

7.98  (0.76) 

msort 

6.45 

4.37  (0.68) 

2.80  (0.43) 

2.75  (0.43) 

2.72  (0.42) 

pia 

0.46 

0.41  (0.89) 

0.37  (0.80) 

0.37  (0.80) 

0.38  (0.83) 

qsort 

0.50 

0.50  (1.00) 

0.44  (0.88) 

0.44  (0.88) 

0.44  (0.88) 

sieve 

0.58 

0.43  (0.74) 

0.43  (0.74) 

0.39  (0.67) 

0.39  (0.67) 

simple 

18.99 

17.94  (0.94) 

15.75  (0.83) 

13.40  (0.71) 

8.52  (0.45) 

soli 

0.32 

0.31  (0.97) 

0.39  (1.22) 

0.31  (0.97) 

0.31  (0.97) 

Geom.  mean 

(0.90) 

(0.77) 

(0.70) 

(0.58) 

Table  8.9:  Effects  of  Flattening  on  Running  Times 
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Program 

Allocation  in  Mbytes  (ratio  to  baseline) 

B 

FC 

CAF 

TDAF 

FRA 

cksum 

307.99 

267.02  (0.87) 

205.35  (0.77) 

143.90  (0.47) 

143.90  (0.47) 

diet 

35.66 

31.73  (0.89) 

17.02  (0.48) 

12.45  (0.35) 

12.45  (0.35) 

fft 

51.48 

51.48  (1.00) 

51.00  (0.99) 

51.00  (0.99) 

9.11  (0.18) 

fmult 

24.00 

24.00  (1.00) 

16.00  (0.67) 

16.00  (0.67) 

16.00  (0.67) 

imult 

96.00 

96.00  (1.00) 

0.00  (0.00) 

0.00  (0.00) 

0.00  (0.00) 

kb 

54.96 

53.52  (0.97) 

53.24  (0.97) 

36.94  (0.67) 

36.94  (0.67) 

lexgen 

22.77 

20.13  (0.88) 

17.87  (0.78) 

8.75  (0.38) 

8.75  (0.38) 

life 

37.80 

26.98  (0.71) 

20.51  (0.54) 

25.45  (0.67) 

25.45  (0.67) 

logic 

384.37 

345.49  (0.90) 

292.07  (0.76) 

253.05  (0.66) 

253.05  (0.66) 

msort 

270.93 

215.70  (0.80) 

114.05  (0.42) 

114.05  (0.42) 

114.05  (0.42) 

pia 

7.64 

6.80  (0.89) 

5.33  (0.70) 

5.24  (0.69) 

5.24  (0.69) 

qsort 

6.06 

6.06  (1.00) 

1.04  (0.17) 

1.04  (0.17) 

1.04  (0.17) 

sieve 

4.21 

2.53  (0.60) 

2.53  (0.60) 

2.53  (0.60) 

2.53  (0.60) 

simple 

717.65 

627.65  (0.87) 

469.02  (0.65) 

316.41  (0.44) 

323.39  (0.45) 

soli 

0.33 

0.33  (1.00) 

0.33  (1.00) 

0.33  (1.00) 

0.33  (1.00) 

Geom.  mean 
(excluding  imult) 

(0.88) 

(0.65) 

(0.56) 

(0.50) 

Table  8.10:  Effects  of  Flattening  on  Allocation 


Chapter  9 

Summary,  Future  Work,  and 
Conclusions 


In  this  thesis,  I  have  demonstrated  that  compiler  writers  can  take  advantage  of  types  for 
everything  from  enhancing  performance  to  proving  correctness.  The  fundamental  idea 
behind  my  approach  is  to  use  a  combination  of  type- directed  translation  and  dynamic 
type  dispatch  to  build  a  language  implementation.  Type-directed  translation  provides  a 
formal  framework  for  specifying  and  proving  the  correctness  of  compiler  transformations, 
whereas  dynamic  type  dispatch  provides  a  means  for  applying  type-directed  translation 
to  languages  with  unknown  or  variable  types. 


9.1  Summary  of  Contributions 

I  have  presented  a  core  calculus,  called  XfL,  that  provides  dynamic  type  dispatch  at  both 
the  term  and  the  constructor  levels.  I  have  shown  that  type-checking  A is  decidable 
and  that  the  type  system  is  sound  with  respect  to  the  operational  semantics. 

I  gave  examples  of  type-directed  translations  for  SML-like  languages  to  AffMikc  lan¬ 
guages.  These  translations  demonstrated  how  function  arguments  and  data  structures 
could  be  flattened,  how  tag-free  ad-hoc  operations  such  as  polymorphic  equality  could 
be  implemented,  how  the  constraints  of  Haskell-style  type  classes  could  be  encoded,  and 
how  communication  primitives  could  be  strongly  typed,  yet  dynamically  instantiated.  I 
also  demonstrated  how  to  prove  correctness  of  these  translations  using  logical  relations. 

I  showed  how  a  key  transformation  in  functional  language  implementation,  closure 
conversion ,  could  be  implemented  as  a  type-directed  and  type-preserving  translation, 
even  for  languages  like  XfL.  I  also  proved  the  correctness  of  this  translation. 

I  developed  a  formal,  yet  intuitive  framework  for  expressing  program  evaluation  that 
makes  the  heap,  stack,  and  registers  explicit.  This  model  of  evaluation  allowed  me 
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to  address  memory  management  issues  that  higher-level  models  leave  implicit.  I  gave  a 
general  definition  of  garbage  and  a  general  specification  of  trace-based  garbage  collectors. 
I  proved  that  such  collectors  do  not  interfere  with  evaluation.  I  then  showed  how  types 
could  be  used  to  derive  shape  information  during  garbage  collection,  thereby  obviating 
the  need  to  place  tags  on  values  at  run  time.  I  proved  the  correctness  of  this  tag-free 
collection  algorithm  for  monomorphic  languages  and  showed  how  to  extend  the  technique 
to  Af^-like  languages.  My  formulation  was  at  a  sufficiently  abstract  level  that  the  proofs 
were  tractable,  yet  the  formulation  was  not  so  abstract  that  important  details  were  lost. 

Together  with  others,  I  constructed  a  compiler  for  SML  called  TIL  to  explore  the 
practical  issues  of  type-directed  translation  and  dynamic  type  dispatch.  This  compiler 
uses  typed  intermediate  languages  based  on  XfL  for  almost  all  optimizations  and  trans¬ 
formations.  TIL  uses  type-directed  translation  and  dynamic  type  dispatch  to  flatten 
arguments,  to  generate  efficient  representations  of  datatypes,  and  to  specialize  arrays. 
TIL  also  uses  dynamic  type  dispatch  to  support  partially  tag-free  garbage  collection  and 
tag-free  polymorphic  equality.  Finally,  for  a  wide  range  of  programs,  the  code  emitted 
by  TIL  is  as  good  or  better  than  code  produced  by  Standard  ML  of  New  Jersey. 


9.2  Future  Work 

Because  this  thesis  explores  so  many  aspects  of  types  and  language  implementation,  from 
proving  compiler  correctness  to  implementing  tag-free  garbage  collection,  there  are  many 
unresolved  issues  among  each  of  the  topics.  In  this  section,  I  discuss  those  issues  that  I 
feel  are  the  most  important. 

9.2.1  Theory 

From  both  a  type-theoretic  standpoint,  one  of  the  most  interesting  open  issue  for  \fL 
is  extending  the  language  to  support  dynamic  type  dispatch  on  recursive  types  at  the 
constructor  level.  This  would  allow  us  to  reify  a  much  wider  class  of  type  translations 
as  constructor  terms.  For  instance,  the  datatype  flattening  used  in  TIL  can  only  be 
expressed  at  the  meta-level  (i.e. ,  in  the  TIL  compiler)  and  not  as  a  constructor  within 
the  intermediate  language  Lmli.  However,  a  straightforward  extension  of  Typerec  to 
generally  recursive  types  is  difficult,  as  this  is  likely  to  break  constructor  normalization, 
and  hence  decidability  of  type  checking. 

Of  a  related  nature,  the  restriction  to  predicative  polymorphism  is  suitable  for  in¬ 
terpreting  ML-like  languages.  However,  this  restriction  prevents  us  from  compiling  lan¬ 
guages  based  on  the  original  Girard-Revnolds  impredicative  calculus.  Girard  has  shown 
that  adding  Typerec-like  operators  to  such  calculi  breaks  strong  normalization  [47],  so  it 
is  unlikely  that  there  is  a  simple  calculus  that  provides  both  decidable  type  checking  and 
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impredicative  polymorphism.  In  an  impredicative  calculus,  recursive  types  (//),  V-types, 
and  3-types  can  all  be  viewed  as  primitive  constructors  of  kind  (Q  — >  Q)  — >■  Q,  so  to  some 
degree,  the  ability  to  analyze  recursive  and  polymorphic  types  requires  some  intensional 
elimination  form  for  functions. 

Whereas  Xf41  provides  a  convenient  intermediate  form  within  a  compiler,  its  use  as 
a  source  language  is  somewhat  problematic.  The  issue  is  that  dynamic  type  analysis 
makes  no  distinction  between  user-level  abstract  types  that  happen  to  have  the  same 
representation.  From  a  compiler  perspective,  the  whole  purpose  of  dynamic  type  dispatch 
is  to  violate  the  very  abstraction  that  a  programmer  establishes.  There  are  a  variety 
of  approaches  that  could  be  taken  to  solve  this  problem.  The  work  of  Duggan  and 
Ophel  [38]  and  Thatte  [116]  on  kind-based  definitions  of  type-classes  seems  promising 
to  me.  The  basic  idea  is  to  allow  users  to  define  new  inductively  generated  kinds  that 
could  be  refinements  or  extensions  of  the  kind  of  monotypes  and,  using  a  combination  of 
type  dispatch  and  methods  corresponding  to  the  new  constructors,  allow  users  to  define 
appropriate  elim  forms  at  both  the  constructor  and  term  levels. 

In  our  previous  work  on  closure  conversion  [92],  we  showed  how  closures  could  be 
represented  using  a  combination  of  translucent  types  and  existentials.  Using  this  rep¬ 
resentation  in  TIL  would  allow  us  to  hoist  code  and  environment  projections  out  of 
loops,  but  would  greatly  complicate  the  type  system.  In  particular,  the  target  language 
of  closure  conversion  would  need  to  be  impredicative,  and  as  mentioned  earlier,  this  is 
problematic  when  combined  with  dynamic  type  analysis.  However,  I  suspect  that  there 
is  a  simpler  formulation  that  offers  the  same  performance  benefits  without  requiring  full 
translucent  sums  and  existentials. 

Finally,  all  of  these  extensions  make  the  underlying  proof  theory  much  more  diffi¬ 
cult.  For  instance,  in  an  impredicative  setting,  we  must  use  some  technique  like  Girard’s 
method  of  candidates  —  instead  of  simple,  set-based  logical  relations  —  to  prove  trans¬ 
lation  correctness.  Providing  simple  formulations  of  these  techniques  is  imperative. 

9.2.2  Practice 

From  a  practical  standpoint,  we  now  know  that  we  can  generate  good  code  for  polymor¬ 
phic  languages  if  types  are  readily  available  at  compile  time.  Furthermore,  we  know  that 
for  at  least  the  applications  studied  here,  types  are  either  known  for  the  most  part,  or  can 
be  made  to  be  known.  However,  I  expect  that  the  degree  of  polymorphism  in  programs 
will  only  increase  as  more  programmers  start  to  use  advanced  languages.  Hence,  the  next 
logical  step  is  to  explore  techniques  to  make  polymorphism  fast  without  constraining  the 
performance  of  monomorphic  code.  For  example,  it  would  be  very  profitable  in  terms  of 
execut  ion  time  to  hoist  type  case  expressions  out  of  loops,  in  an  effort  to  get  good  code 
within  the  loops.  However,  hoisting  type  case  out  of  a  loop  requires  that  we  duplicate 
the  body  of  the  loop  for  each  arm  of  the  type  case.  This  may  be  entirely  reasonable  for 
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small  loops,  such  as  the  dot  product  of  matrix  multiply,  where  we  have  a  small  number 
of  cases  in  the  type  case.  However,  in  general,  I  feel  that  some  sort  of  profile-driven 
feedback  mechanism  is  needed  to  determine  which  typecases  should  be  hoisted. 

Another  practical  point  that  needs  to  be  addressed  is  the  issue  of  unboxing  floating 
point  values  as  function  arguments  and  within  data  structures  other  than  arrays.  Un¬ 
boxing  floating  point  values  in  function  arguments  using  the  vararg/onearg  approach  is 
problematic  when  there  are  a  large  number  of  argument  registers  and  the  underlying  ma¬ 
chine  has  split  integer  and  floating  point  registers.  The  problem  is  that  for  A:  arguments, 
vararg  must  be  able  to  generate  2k  coercions  to  deal  with  all  of  the  possible  calling 
conventions.  This  approach  is  impractical  if  A:  is  greater  than  a  fairly  small  constant  (i.e. , 
5  or  6). 

Fortunately,  there  is  an  alternative  approach.  When  vararg  is  applied  to  a  function 
/  that  is  expecting  one  argument,  we  generate  a  closure  that  contains  a  runtime  routine 
that  works  as  follows:  when  the  routine  is  called,  it  spills  all  of  the  possible  argument 
registers  to  the  stack.  Then,  using  the  type  of  /,  the  routine  determines  which  registers 
actually  contained  arguments.  Next,  the  routine  allocates  a  record  on  the  heap  and  copies 
the  argument  values  into  the  record.  Finally,  the  routine  calls  /  passing  this  record  to 
the  function.  A  primitive  corresponding  to  onearg  would  have  the  opposite  functionality. 
These  primitives  are  likely  to  be  more  expensive  than  the  tailored  conversions  that  TIL 
currently  uses,  and  for  conventional  SML  code  —  which  rarely  manipulates  floating  point 
values  - —  it  is  not  clear  that  the  overheads  would  justify  the  costs. 

There  are  enough  tradeoffs  in  data  representations  that  it  is  not  clear  that,  for  in¬ 
stance,  flattening  floating  point  values  in  records  would  be  worth  the  cost.  Certainly, 
flattening  floating  point  values  in  arrays  has  mixed  benefits.  There  are  other  issues 
that  should  be  addressed  as  well,  including  alignment,  bit  fields,  “endian-ness”,  word 
size,  and  so  forth.  Fortunately,  TIL  provides  an  excellent  framework  to  explore  these 
representation  tradeoffs. 

Clearly,  the  compile  times  of  TIL  are  a  current  problem,  and  we  have  yet  to  address 
this  issue.  Initial  tests  confirm  that  the  size  of  the  type  information  on  intermediate 
terms  is  quite  large.  However,  most  optimizations,  including  common  sub-expression 
elimination,  do  not  optimize  types  that  decorate  terms.  Rather,  the  optimizer  only 
processes  constructors  that  are  bound  via  a  let-construct  at  the  term  level.  By  extending 
the  optimizations  to  process  types,  it  may  be  possible  to  reduce  the  size  significantly,  and 
hence  the  compile  times  of  terms.  Alternatively,  we  could  use  a  representation  where  as 
little  type  and  kind  information  as  is  possible  remains  on  terms,  and  reconstruct  this 
information  as  needed. 

There  are  a  wealth  of  issues  to  explore  with  respect  to  tag-free  garbage  collection. 
For  example,  it  would  be  good  to  implement  a  fully  tagging  and  fully  tag-free  implemen¬ 
tation  of  TIL  to  explore  the  costs  and  benefits  of  our  current  approach.  Fortunately,  we 
abstracted  many  details  of  the  garbage  collection  implementation  in  the  higher  levels  of 
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the  compiler  so  that  these  experiments  would  someday  be  possible. 


9.3  Conclusions 

For  compiler  writers,  types  provide  a  means  of  encapsulating  complex  invariants  and 
analysis  information.  Type-directed  translation  shows  how  we  can  take  this  analysis 
information  from  the  source-level  and  transform  it,  with  the  program,  so  that  interme¬ 
diate  levels  can  take  advantage  of  this  information.  In  this  respect,  the  type  system  of 
\HL  is  far  more  powerful  than  conventional  polymorphic  calculi  because  it  encapsulates 
control-flow  information  (i.e.,  Typerec).  However,  unlike  a  fully  reflective  language,  XflL 
is  sufficiently  restricted  that  we  can  automatically  normalize  types  and  compare  them. 
These  restrictions  make  proofs  of  compiler  correctness  tractable,  and  tools  like  the  type 
checker  for  TIL’s  intermediate  forms  possible. 

A  key  advantage  that  type-directed  translation  has  over  traditional  compiler  trans¬ 
formations  is  that,  for  languages  like  SML,  type  information  is  readily  available.  Pro¬ 
grammers  must  specify  the  types  of  imported  values  and  often  these  types  do  not  involve 
variables.  I  took  advantage  of  this  property  in  TIL  to  perform  argument,  constructor,  and 
array  flattening.  These  type-based  transformations  are  in  no  way  inhibited  by  higher- 
order  functions  or  modules.  In  contrast,  transformations  based  on  data-flow,  control-flow, 
or  set-based  analyses  often  fail  to  optimize  terms  due  to  a  lack  of  information.  For  ex¬ 
ample,  the  conventional  argument  flattener  of  TIL  fails  to  flatten  many  functions  that 
the  type-directed  flattener  does  flatten.  Even  without  programmer-supplied  type  infor¬ 
mation,  the  advances  in  soft  typing  [64,  7,  29,  132]  provide  a  means  for  compiler  writers 
to  take  advantage  of  types. 

In  general,  compilers  and  other  kinds  of  system  software  have  real  issues  and  problems 
that  can  serve  as  the  clients  and  driving  force  behind  the  development  of  advanced  type 
systems.  Changing  an  intermediate  language  in  a  compiler  to  take  advantage  of  recent 
advances  is  much  more  tractable  than  changing  a  ubiquitous  source  language.  As  type 
systems  become  more  advanced,  more  information  will  be  available  to  compilers,  enabling 
more  aggressive  transformations.  Thus,  the  real  future  for  both  type  theory  and  compilers 
is  in  their  marriage. 
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